I have this code which is a alternative to print a string of characters using the loop command.
data segment
mystr db "Hello World!"
ends
code segment
start:
mov ax, data
mov ds, ax
lea bx,mystr
mov cx,50
L1:
mov dl,[BX]
inc BX
cmp dl,'!'
je L2
mov ah,02
int 21h
loop L1
L2:
mov ax, 4c00h
int 21h
ends
end start
The lea command saves mystr to the BX register what does [BX] mean and why does incrementing the BX value gives us access to different parts of the string?
In Intel-style assembly code, square brackets ([..]) mean dereference -- access the memory pointed at by the thing in the brackets.
So [bx] means access the memory pointed at by the bx register, and move dl, [bx] means load a byte from that address and put it into dl
I am a beginner in Assembly and i have a simple question.
This is my code :
BITS 64 ; 64−bit mode
global strchr ; Export 'strchr'
SECTION .text ; Code section
strchr:
mov rcx, -1
.loop:
inc rcx
cmp byte [rdi+rcx], 0
je exit_null
cmp byte [rdi+rcx], sil
jne .loop
mov rax, [rdi+rcx]
ret
exit_null:
mov rax, 0
ret
This compile but doesn't work. I want to reproduce the function strchr as you can see. When I test my function with a printf it crashed ( the problem isn't the test ).
I know I can INC rdi directly to move into the rdi argument and return it at the position I want.
But I just want to know if there is a way to return rdi at the position rcx to fix my code and probably improve it.
Your function strchr seems to expect two parameters:
pointer to a string in RDI, and
pointer to a character in RSI.
Register rcx is used as index inside the string? In this case you should use al instead of cl. Be aware that you don't limit the search size. When the character refered by RSI is not found in the string, it will probably trigger an exception. Perhaps you should test al loaded from [rdi+rcx] and quit further searching when al=0.
If you want it to return pointer to the first occurence of character
inside the string, just
replace mov rax,[rdi+rcx] with lea rax,[rdi+rcx].
Your code (from edit Version 2) does the following:
char* strchr ( char *p, char x ) {
int i = -1;
do {
if ( p[i] == '\0' ) return null;
i++;
} while ( p[i] != x );
return * (long long*) &(p[i]);
}
As #vitsoft says, your intention is to return a pointer, but in the first return (in assembly) is returning a single quad word loaded from the address of the found character, 8 characters instead of an address.
It is unusual to increment in the middle of the loop. It is also odd to start the index at -1. On the first iteration, the loop continue condition looks at p[-1], which is not a good idea, since that's not part of the string you're being asked to search. If that byte happens to be the nul character, it'll stop the search right there.
If you waited to increment until both tests are performed, then you would not be referencing p[-1], and you could also start the index at 0, which would be more usual.
You might consider capturing the character into a register instead of using a complex addressing mode three times.
Further, you could advance the pointer in rdi and forgo the index variable altogether.
Here's that in C:
char* strchr ( char *p, char x ) {
for(;;) {
char c = *p;
if ( c == '\0' )
break;
if ( c == x )
return p;
p++;
}
return null;
}
Thanks to your help, I finally did it !
Thanks to the answer of Erik, i fixed a stupid mistake. I was comparing str[-1] to NULL so it was making an error.
And with the answer of vitsoft i switched mov to lea and it worked !
There is my code :
strchr:
mov rcx, -1
.loop:
inc rcx
cmp byte [rdi+rcx], 0
je exit_null
cmp byte [rdi+rcx], sil
jne .loop
lea rax, [rdi+rcx]
ret
exit_null:
mov rax, 0
ret
The only bug remaining in the current version is loading 8 bytes of char data as the return value instead of just doing pointer math, using mov instead of lea. (After various edits removed and added different bugs, as reflected in different answers talking about different code).
But this is over-complicated as well as inefficient (two loads, and indexed addressing modes, and of course extra instructions to set up RCX).
Just increment the pointer since that's what you want to return anyway.
If you're going to loop 1 byte at a time instead of using SSE2 to check 16 bytes at once, strchr can be as simple as:
;; BITS 64 is useless unless you're writing a kernel with a mix of 32 and 64-bit code
;; otherwise it only lets you shoot yourself in the foot by putting 64-bit machine code in a 32-bit object file by accident.
global mystrchr
mystrchr:
.loop: ; do {
movzx ecx, byte [rdi] ; c = *p;
cmp cl, sil ; if (c == needle) return p;
je .found
inc rdi ; p++
test cl, cl
jnz .loop ; }while(c != 0)
;; fell out of the loop on hitting the 0 terminator without finding a match
xor edi, edi ; p = NULL
; optionally an extra ret here, or just fall through
.found:
mov rax, rdi ; return p
ret
I checked for a match before end-of-string so I'd still have the un-incremented pointer, and not have to decrement it in the "found" return path. If I started the loop with inc, I could use an [rdi - 1] addressing mode, still avoiding a separate counter. That's why I switched up the order of which branch was at the bottom of the loop vs. your code in the question.
Since we want to compare the character twice, against SIL and against zero, I loaded it into a register. This might not run any faster on modern x86-64 which can run 2 loads per clock as well as 2 branches (as long as at most one of them is taken).
Some Intel CPUs can micro-fuse and macro-fuse cmp reg,mem / jcc into a single load+compare-and-branch uop for the front-end, at least when the memory addressing mode is simple, not indexed. But not cmp [mem], imm/jcc, so we're not costing any extra uops for the front-end on Intel CPUs by separately loading into a register. (With movzx to avoid a false dependency from writing a partial register like mov cl, [rdi])
Note that if your caller is also written in assembly, it's easy to return multiple values, e.g. a status and a pointer (in the not-found case, perhaps to the terminating 0 would be useful). Many C standard library string functions are badly designed, notably strcpy, to not help the caller avoid redoing length-finding work.
Especially on modern CPUs with SIMD, explicit lengths are quite useful to have: a real-world strchr implementation would check alignment, or check that the given pointer isn't within 16 bytes of the end of a page. But memchr doesn't have to, if the size is >= 16: it could just do a movdqu load and pcmpeqb.
See Is it safe to read past the end of a buffer within the same page on x86 and x64? for details and a link to glibc strlen's hand-written asm. Also Find the first instance of a character using simd for real-world implementations like glibc's using pcmpeqb / pmovmskb. (And maybe pminub for the 0-terminator check to unroll over multiple vectors.)
SSE2 can go about 16x faster than the code in this answer for non-tiny strings. For very large strings, you might hit a memory bottleneck and "only" be about 8x faster.
I have some misunderstanding about esp pointer.
below are the code which shown in one of previous exams.
The returned value is 1.
func: xor eax,eax
call L3
L1: call dword[esp]
inc eax
L2: ret
L3: call dword[esp]
L4: ret
Now, I will explain how I think and hope someone will correct me or approve.
This is how I think when I know what is the answer so I am not sure I`m thinking correctly at all.
eax = 0
We push to stack return address which is the next line, i.e label L1.
We jump to L3.
We push to stack return address which is the next line, i.e label L4.
We jump to L1.
We push to stack return address which is the next line, i.e inc eax.
We jump to L4.
We jump to line where is inc eax and stack is now empty.
eax = 1.
we end here(at label L2) and return 1.
I think eax = 2 and the caller of func is called once from the instruction at L1. In the following I will trace execution to show you what I mean.
I re-arranged your example a bit to make it more readable. This is NASM source. I think this should be equivalent to the original (assuming the D bit and B bit are set, i.e. you're running in normal 32-bit mode).
bits 32
func:
xor eax, eax
call L3
.returned:
L1:
call near [esp]
.returned:
inc eax
L2:
retn
L3:
call near [esp]
.returned:
L4:
retn
Now assume we start with some function that does this:
foo:
call func
.returned:
X
retn
This is what happens:
At foo we call func. The address of foo.returned is put on the stack (say, stack slot -1).
At func we set eax to zero.
Next we call L3. The address of func.returned = L1 is put on the stack (slot -2).
At L3 we call the dword on top of the stack. Here this is func.returned. The address of L3.returned = L4 is put on the stack (slot -3).
At L1 we call the dword on top of the stack. Here this is L3.returned. The address of L1.returned is put on the stack (slot -4).
At L4 we return. This pops L1.returned (from slot -4) into eip.
At L1.returned we do inc eax, setting eax to 1.
Then at L2 we return. This pops L3.returned (from slot -3) into eip.
At L4 we return. This pops func.returned (from slot -2) into eip.
At L1 we call the dword on top of the stack. Here this is foo.returned. The address of L1.returned is put on the stack (slot -2).
At foo.returned we execute whatever is there which I marked X. Assuming that the function returns using a retn eventually then...
... we return. This pops L1.returned (from slot -2) into eip.
At L1.returned we do inc eax. Assuming that X did not alter eax then we now have eax = 2.
Then at L2 we return. This pops foo.returned (from slot -1) into eip.
If my assumptions are correct then eax is 2 in the end.
Note that it is really strange to call the return address on top of the stack. I cannot imagine a practical use for this.
Also note that if, in a debugger, you proceed-step the call to func in foo then at step 11 the debugger might return control to the user, with eax equal to 1. However, the stack is not balanced at this point.
For example, I want to move a DWORD value in a register into a memory location typed WORD, but am getting errors:
mov [arr + eax*TYPE arr], edx ; error: operands must be same size
the [] brackets dereference to an array element of type WORD.
I've tried doing this as well:
mov dx, edx ; error: operands must be same size.
mov [arr + eax*TYPE arr], dx
Also no luck trying to use PTR:
mov dx, WORD PTR edx ; error: invalid use of register
OR
mov WORD PTR [arr + eax*TYPE arr], edx ; error: invalid use of register
OR
mov [arr + eax*TYPE arr], WORD PTR edx ; error: invalid use of register
Solution? Thanks for any help!
The register DX is actually the lowest 16 bit of the 32 bit register EDX. You don't need to mov dx, edx because DX is already there. So you simply need to store DX in the word sized variable:
mov [word_variable], dx
Of course the highest 16 bit of edx will be lost in such a transfer.
I have recently started delving deeper into Assembly, and I could not determine what is going on when I analyze the following code segment. Essentially, 0xFFFFFFFF is moved into EAX and then 0x10 is added to it. When viewing EAX within GDB, the value after execution is 0xF rather than 0x9. When I add 0x11, rather than 0x10, the proper result (0x10) is displayed. Any help would be much appreciated.
I have attached debug output below.
The first value after command execution is EAX, which is displayed using print/x $eax.
(gdb) ni
$11 = 0xffffffff
Dump of assembler code from 0x8048096 to 0x80480a0:
=> 0x08048096 <_start+22>: add eax,0x10
0x08048099 <_start+25>: mov eax,0x0
0x0804809e <_start+30>: add BYTE PTR ds:0x804910c,0x22
End of assembler dump.
0x08048096 in _start ()
(gdb) ni
$13 = 0xf
Dump of assembler code from 0x8048099 to 0x80480a3:
=> 0x08048099 <_start+25>: mov eax,0x0
0x0804809e <_start+30>: add BYTE PTR ds:0x804910c,0x22
End of assembler dump.
0x08048099 in _start ()
You appear to expect that adding 0x10 to -1 (0xFFFFFFFF) will produce 0x9.
But 0x10 is 16 (decimal), and adding -1 to it produces 15 (decimal), which is 0xF.
So everything is working here just as it should.