I'm trying to understand GAS's behavior of .code16.
From the manual, it seems in 16-bit section, for 32-bit operands or instructions, a 66H operand override prefix will be produced for the instruction encoding. Does that mean
.code16
movw %eax, %ebx
is legal in such mode? Then the code cannot run on 16-bit processor?
These are legal instructions for 80386+.
Starting with the 80386 we can use operandsize- and addresssize- override prefixes. Those prefixes can be used in combination with the 16 bit address mode and with the 32-bit address mode.
Additional it can be used with the real address mode and with the protected mode and the virtual 86 mode. Those prefixes reverse the default operand size and/or the address size for one instruction in the code segment. The default operand size and the address size is specified by the D flag in the code-segment descriptor (or if there is no GDT/LDT, then we become the 16 bit address mode after the POST-process of the bios is done.)
With the 16 bit address mode we have to add those prefixes, if we want to use 32-bit operands and/or 32-bit addresses. Without those prefixes we can only use 16-bit addresses/operands in the 16-bit address mode.
With the 32-bit address mode we have to leave out those prefixes from our code, if we want to use 32-bit operands and/or 32-bit addresses. And if we add those prefixes to our code, then we can use 16-bit addresses/operands in the 32-bit address mode.
Intel:
Instruction prefixes can be used to override the default operand size and address size of a code segment. These prefixes can be used in real-address mode as well as in protected mode and virtual-8086 mode. An operand-size or address-size prefix only changes the size for the duration of the instruction.
The following two instruction prefixes allow mixing of 32-bit and 16-bit operations within one segment:
The operand-size prefix (66H)
The address-size prefix (67H)
These prefixes reverse the default size selected by the D flag in the code-segment descriptor. For example, the processor can interpret the (MOV mem, reg) instruction in any of four ways:
In a 32-bit code segment:
Moves 32 bits from a 32-bit register to memory using a 32-bit effective address.
If preceded by an operand-size prefix, moves 16 bits from a 16-bit register to memory using a 32-bit effective address.
If preceded by an address-size prefix, moves 32 bits from a 32-bit register to memory using a 16-bit effective address.
If preceded by both an address-size prefix and an operand-size prefix, moves 16 bits from a 16-bit register to memory using a 16-bit effective address.
In a 16-bit code segment:
Moves 16 bits from a 16-bit register to memory using a 16-bit effective address.
If preceded by an operand-size prefix, moves 32 bits from a 32-bit register to memory using a 16-bit effective address.
If preceded by an address-size prefix, moves 16 bits from a 16-bit register to memory using a 32-bit effective address.
If preceded by both an address-size prefix and an operand-size prefix, moves 32 bits from a 32-bit register to memory using a 32-bit effective address.
The previous examples show that any instruction can generate any combination of operand size and address size regardless of whether the instruction is in a 16- or 32-bit segment. The choice of the 16- or 32-bit default for a code segment is normally based on the following criteria:
Performance — Always use 32-bit code segments when possible. They run much faster than 16-bit code segments on P6 family processors, and somewhat faster on earlier IA-32 processors.
The operating system the code segment will be running on — If the operating system is a 16-bit operating system, it may not support 32-bit program modules.
Mode of operation — If the code segment is being designed to run in real-address mode, virtual-8086 mode, or SMM, it must be a 16-bit code segment.
Backward compatibility to earlier IA-32 processors — If a code segment must be able to run on an Intel 8086 or Intel 286 processor, it must be a 16-bit code segment.
The D flag in a code-segment descriptor determines the default operand-size and address-size for the instructions of a code segment. (In real-address mode and virtual-8086 mode, which do not use segment descriptors, the default is 16 bits.) A code segment with its D flag set is a 32-bit segment; a code segment with its D flag clear is a 16-bit segment.
Executable code segment. The flag is called the D flag and it indicates the default length for effective addresses and operands referenced by instructions in the segment. If the flag is set, 32-bit addresses and 32-bit or 8-bit operands are assumed; if it is clear, 16-bit addresses and 16-bit or 8-bit operands are assumed. The instruction prefix 66H can be used to select an operand size other than the default, and the prefix 67H can be used select an address size other than the default.
The 32-bit operand prefix can be used in real-address mode programs to execute the 32-bit forms of instructions. This prefix also allows real-address mode programs to use the processor’s 32-bit general-purpose registers.
The 32-bit address prefix can be used in real-address mode programs, allowing 32-bit offsets.
The IA-32 processors beginning with the Intel386 processor can generate 32-bit offsets using an address override prefix; however, in real-address mode, the value of a 32-bit offset may not exceed FFFFH without causing an exception.
Assembler Usage:
If a code segment that is going to run in real-address mode is defined, it must be set to a USE 16 attribute. If a 32-bit operand is used in an instruction in this code segment (for example, MOV EAX, EBX), the assembler automatically generates an operand prefix for the instruction that forces the processor to execute a 32-bit operation, even though its default code-segment attribute is 16-bit.
The 32-bit operand prefix allows a real-address mode program to use the 32-bit general-purpose registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI).
When moving data in 32-bit mode between a segment register and a 32-bit general-purpose register, the Pentium Pro processor does not require the use of a 16-bit operand size prefix; however, some assemblers do require this prefix. The processor assumes that the 16 least-significant bits of the general-purpose register are the destination or source operand. When moving a value from a segment selector to a 32-bit register, the processor fills the two high-order bytes of the register with zeros.
AMD:
3.3.2. 32-Bit vs. 16-Bit Address and Operand Sizes
The processor can be configured for 32-bit or 16-bit address and operand sizes. With 32-bit address and operand sizes, the maximum linear address or segment offset is FFFFFFFFH (2^32-1), and operand sizes are typically 8 bits or 32 bits. With 16-bit address and operand sizes, the maximum linear address or segment offset is FFFFH (2^16-1), and operand sizes are typically 8 bits or 16 bits.
When using 32-bit addressing, a logical address (or far pointer) consists of a 16-bit segment selector and a 32-bit offset; when using 16-bit addressing, it consists of a 16-bit segment selector and a 16-bit offset. Instruction prefixes allow temporary overrides of the default address and/or operand sizes from
within a program.
When operating in protected mode, the segment descriptor for the currently executing code segment defines the default address and operand size. A segment descriptor is a system data structure not normally visible to application code. Assembler directives allow the default addressing and operand size to be chosen for a program. The assembler and other tools then set up the segment descriptor for the code segment appropriately.
When operating in real-address mode, the default addressing and operand size is 16 bits. An address-size override can be used in real-address mode to enable 32-bit addressing; however, the maximum allowable 32-bit linear address is still 000FFFFFH (2^20-1).
3.6. OPERAND-SIZE AND ADDRESS-SIZE ATTRIBUTES
When the processor is executing in protected mode, every code segment has a default operand-size attribute and address-size attribute. These attributes are selected with the D (default size) flag in the segment descriptor for the code segment (see Chapter 3, Protected-Mode Memory Management, in the Intel Architecture Software Developer’s Manual, Volume 3). When the D flag is set, the 32-bit operand-size and address-size attributes are selected; when the flag is clear, the 16-bit size attributes are selected. When the processor is executing in real-address mode, virtual-8086 mode, or SMM (System-Management-Mode), the default operand-size and address-size attributes are always 16 bits.
The operand-size attribute selects the sizes of operands that instructions operate on. When the 16-bit operand-size attribute is in force, operands can generally be either 8 bits or 16 bits, and when the 32-bit operand-size attribute is in force, operands can generally be 8 bits or 32 bits.
The address-size attribute selects the sizes of addresses used to address memory: 16 bits or 32 bits. When the 16-bit address-size attribute is in force, segment offsets and displacements are 16 bits. This restriction limits the size of a segment that can be addressed to 64 KBytes. When the 32-bit address-size attribute is in force, segment offsets and displacements are 32 bits, allowing segments of up to 4 GBytes to be addressed.
The default operand-size attribute and/or address-size attribute can be overridden for a particular instruction by adding an operand-size and/or address-size prefix to an instruction (see “Instruction Prefixes” in Chapter 2 of the Intel Architecture Software Developer’s Manual, Volume 3). The effect of this prefix applies only to the instruction it is attached to.
Table 3-1 shows effective operand size and address size (when executing in protected mode) depending on the settings of the D flag and the operand-size and address-size prefixes.
Related
in c language we use & to get the address of a variable and * to dereference the variable.
int variable=10;
int *pointer;
pointer = &variable;
How to do it in nasm x86 assembly language.
i read nasm manual and found that [ variable_address ] works like dereferencing.( i maybe wrong ).
section .data
variable db 'A'
section .text
global _start
_start:
mov eax , 4
mov ebx , 1
mov ecx , [variable]
mov edx , 8
int 0x80
mov eax ,1
int 0x80
i executed this code it prints nothing. i can't understand what is wrong with my code.
need your help to understand pointer and dereferencing in nasm x86.
There are no variables in assembly. (*)
variable db 'A'
Does several things. It defines assembly-time symbol variable, which is like bookmark into memory, containing address of *here* in the time of compilation. It's same thing as doing label on empty line like:
variable:
The db 'A' directive is "define byte", and you give it single byte value to be defined, so it will produce single byte into resulting machine code with value 0x41 or 65 in decimal. That's the value of big letter A in ASCII encoding.
Then:
mov ecx , [variable]
Does load 4 bytes from memory cells at address variable, which means the low 8 bits ecx will contain the value 65, and the upper 24 bits will contain some junk which happened to reside in the following 3 bytes after the 'A' .. (would you use db 'ABCD', then the ecx would be equal to value 0x44434241 ('D' 'C' 'B' 'A' letters, "reversed" in bits due to little-endian encoding of dword values on x86).
But the sys_write expect the ecx to hold address of memory, where the content bytes are stored, so you need instead:
mov ecx, variable
That will in NASM load address of the data into ecx.
(in MASM/TASM this would instead assemble as mov ecx,[variable] and to get address you have to use mov ecx, OFFSET variable, in case you happen to find some MASM/TASM example, be aware of the syntax difference).
*) some more info about "no variables". Keep in mind in assembly you are on the machine level. On the machine level there is computer memory, which is addressable by bytes (on x86 platform! There are some platforms, where memory may be addressable by different size, they are not common, but in micro-controllers world you may find some). So by using some memory address, you can access some particular byte(s) in the physical memory chip (which particular physical place in memory chip is addressed depends on your platform, the modern OS will usually give user application virtual addressing space, translated to physical addresses by CPU on the fly, transparently, without bothering user code about that translation).
All the advanced logical concepts like "variables", "arrays", "strings", etc... are just bunch of byte values in memory, and all that logical meaning is given to the memory data by the instructions being executed. When you look at those data without the context of the instructions, they are just some byte values in memory, nothing more.
So if you are not precise with your code, and you access single-byte "variable" by instruction fetching dword, like you did in your mov ecx,[variable] example, there's nothing wrong about that from the machine point of view, and it will happily fetch 4 bytes of memory into ecx register, nor the NASM is bothered to report you, that you are probably out-of-bounds accessing memory beyond your original variable definition. This is sort of stupid behaviour, if you think in terms like "variables", and other high-level programming languages concepts. But assembly is not intended for such work, actually having the full control over machine is the main purpose of assembly, and if you want to fetch 4 bytes, you can, it's all up to programmer. It just requires tremendous amount of precision, and attention to detail, staying aware of your memory structures layout, and using correct instructions with desired memory operand sizes, like movzx ecx,byte [variable] to load only single byte from memory, and zero-extend that value into full 32b value in the target ecx register.
What happens to instruction pointers when address overrides are used to target a smaller address space e.g. the default is 32-bit address but the override converts to 16?
So, let's say we're in x86-32 mode and the default is a 32-bit memory space for the current code segment we're in.
Further, the IP register contains the value 87654321h.
If I use 67h to override the default and make the memory space 16-bit for just that one instruction, how does the processor compute the offset into the current code segment?
Some bits in the IP have to be ignored, otherwise you'd be outside the 16-bit memory space specified by the override.
So, does the processor just ignore the 8765 part in the IP register?
That is, does the processor just use the 4 least significant bits and ignore the 4 most significant bits?
What about address overrides associated with access to data segments?
For example, we're in x86-32 mode, the default is 32 bit memory addressing and we use 67h prefix for this instruction: mov eax, [ebx].
Now, ebx contains a 32 bit number.
Does the 67h override change the above instruction to: mov eax, [bx]?
What about "constant pointers"? Example: mov eax, [87654321].
Would the 67h override change it to mov eax, [4321]?
Does the memory override affect the offset into the data segment also or just the code segment?
How do address overrides affect the stack pointer?
If the stack pointer contains a 32 bit number (again we'll use 87654321h) and I push or pop, what memory is referenced?
Pushing and popping indirectly accesses memory.
So, would you only use the 4321 bits in the IP register ignoring the most significant bits?
Also, what about the segment bases themselves?
Example: we're in x86-32 mode, default 32 bit memory space, but we use 67h override.
The CS register points to a descriptor in the GDT whose segment base is, again lol, 87654321h.
We're immediately outside of the 16-bit memory range without even adding an offset.
What does the processor do? Ignore the 4 most significant bits? The same question can be applied to the segment descriptors for the data and stack segments.
0x67 is the address-size prefix. It changes the interpretation of an addressing mode in the instruction.
It does not put the machine temporarily into 16-bit mode or truncate EIP to 16-bit, or affect any other addresses that don't explicitly come from an [addressing mode] in the instruction.
For push/pop, the instruction reference manual entry for push says:
The address size is used only when referencing a source operand in memory.
So in 32-bit mode, a16 push eax would still set esp-=4 and then store [esp] = eax. It would not truncate ESP to 16 bits. The prefix would have no effect, because the only memory operand is implicit not explicit.
push [ebx] is affected by the 67 prefix, though.
db 0x67
push dword [ebx]
would decode as push dword [bp+di], and load 32 bits from that 16-bit address (ignoring the high 16 of those registers). (16-bit addressing modes use a different encoding than 32/64 (with no optional SIB byte).
However, it would still update the full esp, and store to [esp].
(For the effective-address encoding details, see Intel's volume 2 PDF, Chapter 2: INSTRUCTION FORMAT, table 2-1 (16-bit) vs. table 2-2 (32-bit).)
In 64-bit mode, the address-size prefix would turn push [rbx] into push [ebx]).
Since some forms of push can be affected by the address-size prefix, this might not fall into the category of meaningless prefixes, use of which is reserved and may produce unpredictable behaviour in future CPUs. (What happens when you use a memory override prefix but all the operands are registers?). OTOH, that may only apply to the push r/m32 opcode for push, not for the push r32 short forms that can't take a memory operand.
I think the way it's worded, Intel's manual really doesn't guarantee that even the push r/m32 longer encoding of push ebx wouldn't decode as something different in future CPUs with a 67 prefix.
For example, we're in x86-32 mode, the default is 32 bit memory addressing and we use 67h prefix for this instruction: mov eax, [ebx].
Now, ebx contains a 32 bit number.
Does the 67h override change the above instruction to: mov eax, [bx]?
What about "constant pointers"? Example: mov eax, [87654321].
Would the 67h override change it to mov eax, [4321]?
The address size override doesn't just change the size of the address, it actually changes the addressing scheme.
A 67 override on mov eax, [ebx] changes it to mov eax, [bp+di].
A 67 override on mov eax, [87654321] changes it to mov eax, [di] (followed by and [ebx+65], eax and some xchg instruction).
Let's say we have two machines on a network MA and MB,
MA considers little endian the order of the bits in a byte,
on the contrary MB considers big endian the order of the bits in a byte.
How do MA and MB agree on what "endianess" to use for the bits in a byte during the
communication over the network ?
Is there a standard "network endianess" or what ?
Do socket programmers have to take any actions in ensuring a correct communication ?
For example HTTP is a text protocol, that means that machines send and receive bytes which represent characters,what if the encoding of the characters is different in the endianess of the bits ?
Yes, the hardware protocols specify the bit order of bytes on all network links. This is generally handled automatically by the NIC hardware.
See, for example, this description of Ethernet frame format.
Ethernet transmission is strange, in that the byte order is big-endian (leftmost byte is sent first), but bit order little-endian (rigthmost, or LSB (Least Significant Bit) of the byte is sent first).
Check this page: http://www.comptechdoc.org/independent/networking/protocol/protlayers.html
It suggests that byte ordering is done at the Presentation Layer which is quite high up. This however will relate specifically to the application that you are using. I suspect data at lower levels (wrapping the higher levels) has a predetermined byte and bit order.
I'm learning about registers. It looks like 32-bit registers are divided up so that they can be accessed as 8-bit registers. This looks very inefficient. Performance would be improved if they didn't do this. So why do they do it?
Also, it costs extra money to design them like this. Why not make the CPU cheaper by not doing it?
Because if you're only dealing with 8bit values, it'd be inefficient to have issue all the bitmasks to limit those 32/64bit register to just the 8bits you're working on.
So, x86 registers have
AH/AL = high/low 8bits of a 16bit register
AX = whole 16bit register
EAX = whole 32bit register
It's far more efficient, in terms of instruction size to have
mov ah, 0xXX (2 bytes)
rather than forcing
mov ax, 0x00XX (3 bytes)
mov eax, 0x000000XX (7 bytes)
As for "designing the cpu to make it cheaper" - it's for backwards compatibility. All modern x86 processors are actually internally a RISC design, with a major chunk of silicon dedicated to taking the x86 instructions coming in and converting them into the CPU's own internal micro-ops (which is basically a RISC instruction set).
The Intel 8080, which was the first "mainstream" microprocessor, had seven main 8-bit registers (A, B, C, D, E, H, and L). Because memory addresses were 16 bits, instructions that needed to use a non-constant memory operand would use a pair of registers (most commonly H and L, but sometimes B and C, or D and E) to form the address. Because the registers in the aforementioned pairs were often used together to represent 16-bit values, there were a few instructions which could operate upon the register pairs as 16-bit quantities. An instruction to add BC to HL would perform the addition by adding C to L, and then by adding B to H (plus a carry if needed). I'm not familiar enough with the 4004 or 8008 (the two predecessors of the 8080) to know if either of them did anything similar in its architecture.
When Intel produced the 8088, they included a full 16-bit arithmetic unit, but they wanted code which was written for the 8080 to be easily convertible to their new architecture. On the 8080, a lot of code had been written to "manually" form addresses out of the 8-bit parts, since doing so was often much faster than using the 16-bit instructions to do the math. For example, if one needed to access some specified table of 256 entries with an index stored in A, one could have done something like (Zilog notation show, but the 8080 had the same instructions):
ld hl,(baseOfTable) ; 16-bit address
ld c,a
ld b,#0
add hl,bc
ld a,(hl)
but if one could make certain the table was aligned on a 256-byte boundary, one could simplify the code considerably:
ld l,a
ld a,(tableBaseMSB) ; Just load the MSB--assume the LSB is zero
ld h,a
ld a,(hl)
With the 8088 instruction set, it wouldn't terribly often be useful for code written "from scratch" to access the upper and lower parts of registers separately, but there was a lot of code written for the 8080 which used such techniques, and Intel wanted to make it easy for people to convert such code for use on the 8088. Allowing registers to be built from 8-bit pieces was helpful in that regard.
Incidentally, there was another advantage to Intel's architecture: since it included four 16-bit only registers and four registers which could be used as either one 16-bit or two 8-bit parts, that made it possible for code to hold 12 values in registers if eight of them were 255 or less, or eleven values if six of them were 256 or less, etc. When using architectures with more registers, eking out an extra register here and there isn't quite so important, but on the 8088 it was often very helpful.
The ability to address portions of the registers has no effect on their performance when used as 32-bit registers. In that case, this capability just isn't used.
CPUs, regardless of their native bit size, need to manipulate 8-bit values very, very often. Strings of text, for example, are frequently manipulated as consecutive 8-bit values. International character sets are often manipulated as sets of consecutive 16-bit values. So being able to operate rapidly on 8-bit and 16-bit values is of tremendous importance.
If you're asking as a practical matter for x86 CPUs, it's too late. The very first PC CPUs didn't even have 32-bit registers, and compatibility has been retained all the way through.
Backwards compatibility. Processor manufacturers did not wanted to break compatibility with old software. This is the main reason why x86_64 processors still support 16bit software(virtual mode). If you look closely you'll see that majority of the features in x86 architecture are shaped by compatibility concerns. I'm no hating.
I've got to learn assembly and I'm very confused as to what the different registers do/point to.
On some architectures, like MIPS, all registers are created equal, and there is really no difference beyond the name of the register (and software conventions). On x86 you can mostly use any registers for general-purpose computing, but some registers are implicitly bound to the instruction set.
Lots of information about special purposes for registers can be found here.
Examples:
eax, accumulator: many arithmetic instructions implicitly operate on eax. There are also special shorter EAX-specific encodings for many instructions: add eax, 123456 is 1 byte shorter than add ecx, 123456, for example. (add eax, imm32 vs. add r/m32, imm32)
ebx, base: few implicit uses, but xlat is one that matches the "Base" naming. Still relevant: cmpxchg8b. Because it's rarely required for anything specific, some 32-bit calling-conventions / ABIs use it as a pointer to the "global offset table" in Position Independent Code (PIC).
edx, data: some arithmetic operations implicitly operate on the 64-bit value in edx:eax
ecx, counter used for shift counts, and for rep movs. Also, the mostly-obsolete loop instruction implicitly decrements ecx
esi, source index: some string operations read a string from the memory pointed to by esi
edi, destination index: some string operations write a string to the memory pointed to by edi. e.g. rep movsb copies ECX bytes from [esi] to [edi].
ebp, base pointer: normally used to point to local variables. Used implicitly by leave.
esp, stack pointer: points to the top of the stack, used implicitly by push, pop, call and ret
The x86 instruction set is a complex beast, really. Many instructions have shorter forms that implicitly use one register or another. Some registers can be used to do certain addressing while others cannot.
The Intel 80386 Programmer's Reference Manual is a irreplaceable resource, it basically tells you everything there is to know about x86 assembly, except for newer extensions and performance on modern hardware.
The PC Assembly (e)book is a great resource for learning assembly.
The sp register is the stack pointer, used for stack operation like push and pop.
The stack is known as a LIFO structure (last-in, first-out), meaning that the last thing pushed on is the fist thing popped off. It's used, among other things, to implement the ability to call functions.
The bp register is the base pointer, and is commonly used for stack frame operations.
This means that it's a fixed reference to locate local variables, passed parameters and so forth on the stack, for a given level (while sp may change during the execution of a function, bp usually does not).
If you're looking at assembly language like:
mov eax, [bp+8]
you're seeing the code access a stack-level-specific variable.
The si register is the source index, typically used for mass copy operations (di is its equivalent destination index). Intel had these registers along with specific instructions for quick movement of bytes in memory.
The e- variants are just the 32-bit versions of these (originally) 16-bit registers. And, as if that weren't enough, we have 64-bit r- variants as well :-)
Perhaps the simplest place to start is here. It's specific to the 8086 but the concepts haven't changed that much. The simplicity of the 8086 compared to the current crop will be a good starting point for your education. Once you've learned the basics, it will be much easier to move up to the later members of the x86 family.
Transcribed here and edited quite a bit, to make the answer self-contained.
GENERAL PURPOSE REGISTERS
8086 CPU has 8 general purpose registers, each register has its own name:
AX - the accumulator register (divided into AH/AL). Probably the most commonly used register for general purpose stuff.
BX - the base address register (divided into BH/BL).
CX - the count register (divided into CH/CL). Special purpose instructions for loping and shifting.
DX - the data register (divided into DH/DL). Used with AX for some MUL and DIV operations, and for specifying ports in some IN and OUT operations.
SI - source index register. Special purpose instruction to use this as a source of mass memory transfers (DS:SI).
DI - destination index register. Special purpose instruction to use this as a destination of mass memory transfers (ES:DI).
BP - base pointer, primarily used for accessing parameters and variables on the stack.
SP - stack pointer, used for the basic stack operations.
SEGMENT REGISTERS
CS - points at the segment containing the current instruction.
DS - generally points at segment where variables are defined.
ES - extra segment register, it's up to a coder to define its usage.
SS - points at the segment containing the stack.
Although it is possible to store any data in the segment registers, this is never a good idea. The segment registers have a very special purpose - pointing at accessible blocks of memory.
Segment registers work together with general purpose register to access any memory value. For example, if we would like to access memory at the physical address 12345h, we could set the DS = 1230h and SI = 0045h. This way we can access much more memory than with a single register, which is limited to 16 bit values.
The CPU makes a calculation of the physical address by multiplying the segment register by 10h and adding the general purpose register to it (1230h * 10h + 45h = 12345h):
1230
0045
=====
12345
The address formed with 2 registers is called an effective address.
This usage is for real mode only (which is the only mode the 8086 had). Later processors changed these registers from segments to selectors and they are used to lookup addresses in a table, rather than having a fixed calculation performed on them.
By default BX, SI and DI registers work with DS segment register; and BP and SP work with SS segment register.
SPECIAL PURPOSE REGISTERS
IP - the instruction pointer:
Always points to next instruction to be executed.
Offset address relative to CS.
IP register always works together with CS segment register and it points to currently executing instruction.
FLAGS REGISTER
Determines the current state of the processor. These flags are modified automatically by CPU after mathematical operations, this allows to determine the type of the result, and to determine conditions to transfer control to other parts of the program.
Generally you cannot access these registers directly.
Carry Flag CF - this flag is set to 1 when there is an unsigned overflow. For example when you add bytes 255 + 1 (result is not in range 0...255). When there is no overflow this flag is set to 0.
Parity Flag PF - this flag is set to 1 when there is even number of one bits in result, and to 0 when there is odd number of one bits.
Auxiliary Flag AF - set to 1 when there is an unsigned overflow for low nibble (4 bits).
Zero Flag ZF - set to 1 when result is zero. For non-zero result this flag is set to 0.
Sign Flag SF - set to 1 when result is negative. When result is positive it is set to 0. (This flag takes the value of the most significant bit.)
Trap Flag TF - Used for on-chip debugging.
Interrupt enable Flag IF - when this flag is set to 1 CPU reacts to interrupts from external devices.
Direction Flag DF - this flag is used by some instructions to process data chains, when this flag is set to 0 - the processing is done forward, when this flag is set to 1 the processing is done backward.
Overflow Flag OF - set to 1 when there is a signed overflow. For example, when you add bytes 100 + 50 (result is not in range -128...127).
Here's a simplified summary:
ESP is the current stack pointer, so you generally only update it to manipulate stack, and EBP is intended for stack manipulation too, for example saving the value of ESP before allocating stack space for local variables. But you can use EBP as a general purpose register too.
ESI is the Extended Source Index register, "string" (different from C-string, and I don't mean the type of C-string women wear either) instructions like MOVS use ESI and EDI.
Memory Addressing:
x86 CPUs have these special registers called "segment registers", each of them can point to different address, for example DS (commonly called data segment) may point to 0x1000000, and SS (commonly called stack segment) may point to 0x2000000.
When you use EBP and ESP, the default segment register used is SS, for ESI (and other general purpose registers), it's DS. For example, let's say DS=0x1000000, SS=0x2000000, EBP=0x10, ESI=0x10, so:
mov eax,[esp] //loading from address 0x2000000 + 0x10
mov eax,[esi] //loading from address 0x1000000 + 0x10
You can also specify a segment register to use, overriding the default:
mov eax,ds:[ebp]
In terms of addition, subtraction, logical operations, etc, there's no real difference between them.