In my x86 ASM textbook there is an example of a macro which writes a string in the standard output stream:
%macro PRINT 1
pusha
pushf
jmp %%astr
%%str db %1, 0
%%strln equ $-%%str
%%astr: _syscall_write 1, %%str, %%strln
popf
popa
%endmacro
What I can not understand is why do we push and pop all general purpose registers values to / from the stack? _syscall_write does not modify any registers but EAX which will hold the result of the system call. So why don't we just push and pop just EAX? Won't it be more efficient?
Related
Are we enclosing the variable or register in brackets to specify a pointer in assembly?
Example1;
MOV eax, array+4
LEA eax, [array+4]
Example2;
section .data
array DB 116,97
section .bss
variable RESB 0
section .text
global _start:
_start:
mov eax,[array]
;exit
mov eax,1
int 0x80
I am not getting any errors while compiling or running the above code. Is the address of the zero index of the array placed in the EAX register?
Example3;
INC [variable]
When compiling the above code, I am getting the "operation size not specified" error. And why can't the command be used as INC variable?
Example4;
section .data
array DB 116,97
section .bss
variable RESB 97
section .text
global _start:
_start:
mov eax,4
mov ebx,1
mov ecx,variable
mov edx,1
int 0x80
;exit
mov eax,1
int 0x80
And this code is not working.
Are we enclosing the variable or registrar in brackets to specify a
pointer in assembly?
Example1;
MOV eax, array+4
LEA eax, [array+4]
The brackets are like the dereference operator in C (*ptr). They get the value at the resulting address inside the square brackets. As for the example, both of these essentially do the same thing. The first moves the address of the array label + 4 into eax. The second uses lea, which loads the effective address of its source operand. So you get array + 4, dereference it, and get the address again with lea and load it into eax.
Example2;
section .data
array DB 116,97
section .bss
variable RESB 0
section .text
global _start:
_start:
mov eax,[array]
;exit
mov eax,1
int 0x80
I am not getting any errors while compiling or running the above code.
Is the address of the zero index of the array placed in the eax
register?
Kind of. Since you're moving it into eax, a 32-bit register, it is assumed that you want to move the first 4 bytes at the address array into eax. But there are only 2 bytes at array: 116 and 97. So this is probably not what you intended. To load the first byte at array into eax, do movzx eax, BYTE [array], which will move array[0] into the LSByte of eax and zero out the higher bytes. mov al, [array] will also work, though it won't zero out the upper bytes.
Example3;
INC [variable]
When compiling the above code, I am getting the "operation size not
specified" error. And why can't the command be used as INC variable.
The error says it all. variable is just an address. When you use [], how many bytes should it take? You need to specify a size. For example to get the first byte, you would do inc BYTE [variable]. However, from the previous example, it seems like you've reserved nothing at variable, so trying to access any bytes at it may cause some issue. As for "And why can't the command be used as INC variable", as I just said, variable is just a label which translates to some address. You can't change the address which variable translates to.
Example4;
section .data
array DB 116,97
section .bss
variable RESB 97
section .text
global _start:
_start:
mov eax,4
mov ebx,1
mov ecx,variable
mov edx,1
int 0x80
;exit
mov eax,1
int 0x80
And this code is not working.
It may seem to not be printing anything, but it actually is. .bss zero-initializes any memory that you reserve. That means when you print the first byte at variable, it just prints the NUL character. However, this doesn't seem to be visible for you when you print it, so it seems like nothing has been printed.
(By the way, are you certain that you know what resb does? In one example, you reserve 0 bytes, and in another, you reserve 97 bytes for no apparent reason. You might want to take another look at what resb actually does.)
array ; variable address
byte[array] ; value of first byte of array
word[array] ; value of first word of array
byte[array + 1] ; value of second byte of array
Think of the variable names as pointers, and using size[name] gets the value being pointed (similar to *name in C where name is a pointer)
Ive been going over the book over and over again and cannot understand why this is giving me "improper operand type". It should work!
This is inline assembly in Visual Studio.
function(unsigned int* a){
unsigned int num;
_asm {
mov eax, a //This stores address (start of the array) in eax
mov num, dword ptr [eax*4] //This is the line I am having issues with.
That last line, I am trying to store the 4 byte value that is in the array. But I get error C2415: improper operand type
What am I doing wrong? How do I copy 4 byte value from an array into a 32 bit register?
In Visual C++'s inline assembly, all variables are accessed as memory operands1; in other words, wherever you write num you can think that the compiler will replace dword ptr[ebp - something].
Now, this means that in the last mov you are effectively trying to perform a memory-memory mov, which isn't provided on x86. Use a temporary register instead:
mov eax, dword ptr [a] ; load value of 'a' (which is an address) in eax
mov eax, dword ptr [eax] ; dereference address, and load contents in eax
mov dword ptr [num], eax ; store value in 'num'
Notice that I removed the *4, as it doesn't really make sense to multiply a pointer by four - maybe you meant to use a as base plus some other index?
1 Other compilers, such as gcc, provide means to control way more finely the interaction between inline assembly and compiler generated code, which provides great flexibility and power but has quite a steep learning curve and requires great care to get everything right.
I have a few issues with using registers and storing data.
Before I read in a character I want a buffer of size 100 that the register ESI points to.
Do I use this?
mov esi, 100 to store a buffer for size 100,
and then
mov esi, [al]
inc esi
to store the current character I entered into the esi and move it to the next location to store a new character?
I also can't find out how to properly check if a null terminated character is entered.
I've tried cmp al, 0xa to check for a new line
and cmp eax, -1 to check eof.
Note: I have a function called read_char to read in a character to put into the al register
To define a buffer in NASM you can use buffer times 100 db 0
You get its address with mov esi, buffer
To store the character in AL in it, and raise the address write mov [esi], al inc esi
how to properly check if a null terminated character is entered
The null would be the byte following the character. You need to compare a word for that. Read the character and the following byte, then compare:
mov ax, [esi]
cmp ax, 0x000A
This tests if linefeed was the last item in this zero-terminated string.
int sort(int* list)
{
__asm
{
mov esi, [list];
mov eax, dword ptr[esi + edx * 4]; store pointer to eax?
mov edi, dword ptr[esi + 4 + edx * 4]; store pointer to edi?
jmp swap;
swap:
push dword ptr[esi + edx * 4];
mov dword ptr[esi + edx * 4], edi;
pop dword ptr[esi + 4 + edx * 4];
This is a portion of my homework code, it works properly but I want to know how I can change my swap to use registers instead of dword ptrs. I initially had:
swap: (none of this works... values remain unchanged. why? =[ )
push eax; supposed to push value pointed to?
mov eax, edi; supposed to change value pointed at by eax?
pop edi; supposed to pop pushed value into edi pointer location?
but this doesn't actually swap anything, the array passed in doesn't change. How can I get rewrite my code so that the swap statement looks like this? I tried putting [] around eax in the above swap statement but that doesn't work either.
With three instructions (as Kerrek SB said) and only one register (EAX):
int exchange ()
{ int list[5] = {1,5,2,4,3};
__asm { mov edx, 0
lea esi, list
// SWAP WITH THREE INSTRUCTIONS.
mov eax, [esi + edx * 4]
xchg [esi + 4 + edx * 4], eax
mov [esi + edx * 4], eax
// NOW LIST = {5,1,2,4,3};
}
}
Or, with the array as parameter :
int exchange ( int * list )
{ __asm { mov edx, 0
mov esi, list
// SWAP WITH THREE INSTRUCTIONS.
mov eax, [esi + edx * 4]
xchg [esi + 4 + edx * 4], eax
mov [esi + edx * 4], eax
// LIST = {5,1,2,4,3};
}
}
And this is how to call it :
int list[5] = {1,5,2,4,3};
exchange( list );
From what I understand you want to swap two double word values within two different arrays and you want to do this using two registers. You are loading EAXand EDI with two values (one from each array), after you swap the register values you need to store/save them back into their respective array offsets in memory for their values to change. So continuing your line of code, try:
Push Eax
Mov Eax, Edi
Pop Edi
Mov dword ptr[esi + 4 + edx * 4], Eax
Mov dword ptr[esi + edx * 4], Edi
You can leave out the dword ptr type override prefix when the destination operand is an extended register, I believe it will be assumed that the source value will be the same size (double word). So this will also work:
mov eax, [esi + edx * 4]
mov edi, [esi + 4 + edx * 4]
Also, do you have to use that mode of addressing? It seems you are using indirect indexed displacement addressing.
Part of the confusion might be how your function receives its inputs. If you write your whole function in asm, rather than inline with MSVC-specific syntax, then the ABI tells you that your parameters will be on the stack (for 32bit x86 code). http://www.agner.org/optimize/ has a calling-conventions doc, too, covering the various different calling conventions for x86 and x86-64.
Anyway.
xchg might seem like exactly the instruction you want for doing a swap. If you really do need to exchange the contents of two registers, it's very similar in performance to the 3 mov instructions that would otherwise be required, but without the temporary register needed. However, it's somewhat rare to actually need to swap two registers, rather than just overwrite one, or save the old value somewhere else. Also, 3 mov reg, reg will be faster on Ivy Bridge / Haswell, because they don't need an execution unit for it; they just handle it in the register rename stage (with 0 latency).
For swapping the contents of two memory locations, it's at least 25 times slower than using mov for loads/stores, due to the implicit LOCK prefix forcing the CPU to make sure all other cores get the update right away, instead of just writing to L1 cache.
What you need to do is 2 loads, and 2 stores.
The simplest form (2 loads, 2 stores, works in the general case) will be
# void swap34(int *array)
swap34:
# 32bit: mov edi, [esp+4] # [esp] has the return address
# 64bit windows: mov rdi, rcx # first function arg comes in rcx
# array pointer in rdi, assuming 64bit SystemV (Linux) ABI.
mov eax, [rdi+8] # load array[3]
mov ebx, [rdi+12] # load array[4]
mov [rdi+12], eax # store array[4] = tmp1
mov [rdi+8], ebx # store array[3] = tmp2
ret
With more complex addressing modes (e.g. [rdi+rax*4], you could swap list[rax] with list[rbx].)
If the memory locations are adjacent, and you can load both at once with a wider load, and rotate to swap. e.g.
# int *array in rdi
mov rax, [rdi+4] # load 2nd and 3rd 32bit element
rol rax, 32 # rotate left by half the reg width
mov [rdi+4], rax # store back to the same place
I believe those 3 instructions will actually run faster than rol [rdi+4], 32. (rotate with memory operand and imm8 count is 4 uops on Intel Sandybridge, throughput of 1 per 2 cycles. The load/rot/store is 3 uops, and should sustain 1 per cycle. The memory-operand version uses fewer instruction bytes. It doesn't leave either value in a register, though. Usually in real code, you're going to want to do something further with one of the values.)
The only other way I can think of to use fewer instructions would if you had rsi and rdi pointing at the values to be swapped. Then you could
movd eax, [rdi] ; DON'T DO THIS,
movsd ; string-move, 4B version. copies [rsi] to [rdi]
movd [rsi-4], eax ; IT'S SLOW
This would be a lot slower than 2 loads / 2 stores, and movsd increments rsi and rdi. Saving an instruction here actually makes for slower code, and code that uses MORE space in the uop cache on recent Intel design. (A movsd without a rep prefix is never a good choice.)
Another instruction that reads from one memory location and writes to another is pop or push with a memory operand, but that only works if the stack pointer was already pointing to one of the values you wanted to swap, and you didn't care about changing the stack pointer.
Don't go messing with the stack pointer. You can in theory save the stack pointer somewhere, and use it as another GP register for a loop where you're out of registers, but only if you don't need to call anything, and nothing asynchronous can happen that might try to use the stack while you have rsp not pointing at the stack. Seriously, it's really rare for even hand-written performance-tuned asm to use the stack pointer for anything but the normal use, so really just forget I mentioned it.
I've seen that in libc.so the actual type of strcmp_sse to call is decided by the function strcmp itself.
Here it is the code:
strcmp:
.text:000000000007B9F0 cmp cs:__cpu_features.kind, 0
.text:000000000007B9F7 jnz short loc_7B9FE
.text:000000000007B9F9 call __init_cpu_features
.text:000000000007B9FE
.text:000000000007B9FE loc_7B9FE: ; CODE XREF: .text:000000000007B9F7j
.text:000000000007B9FE lea rax, __strcmp_sse2_unaligned
.text:000000000007BA05 test cs:__cpu_features.cpuid._eax, 10h
.text:000000000007BA0F jnz short locret_7BA2B
.text:000000000007BA11 lea rax, __strcmp_ssse3
.text:000000000007BA18 test cs:__cpu_features.cpuid._ecx, 200h
.text:000000000007BA22 jnz short locret_7BA2B
.text:000000000007BA24 lea rax, __strcmp_sse2
.text:000000000007BA2B
.text:000000000007BA2B locret_7BA2B: ; CODE XREF: .text:000000000007BA0Fj
.text:000000000007BA2B ; .text:000000000007BA22j
.text:000000000007BA2B retn
What I do not understand is that the address of the strcmp_sse function to call is placed in rax and never actually called. Therefore I am wondering: who is going do call *rax? When?
Linux dynamic linker supports a special symbol type called STT_GNU_IFUNC. Strcmp is likely implemented as an IFUNC. 'Regular' symbols in a dynamic library are nothing more but a mapping from a name to the address. IFUNCs are a bit more complex than that: the address isn't readily available, in order to obtain it the linker must execute a piece of code from the library itself. We are seeing an example of such a peice of code here. Note that in x86_64 ABI a function returns the result in RAX.
This technique is typically used to pick the optimal implementation based on the CPU features. Please note that the selection logic runs only once; all but the first call to strcmp are fast.