Issue with strchr() function implementation - unix

I've recently started looking into assembly code and I'm trying to recode some basic system functions to get a grip on it, I'm currently stuck on a segmentation fault at 0x0 on my strchr.
section .text
global strchr
strchr:
xor rax, rax
loop:
cmp BYTE [rdi + rax], 0
jz end
cmp sil, 0
jz end
cmp BYTE [rdi + rax], sil
jz good
inc rax
jmp loop
good:
mov rax, [rdi + rcx]
ret
end:
mov rax, 0
ret
I can't figure out how to debug it using GDB, also the documentation I've came across is pretty limited or hard to understand.
I'm using the following main in C to test
extern char *strchr(const char *s, int c);
int main () {
const char str[] = "random.string";
const char ch = '.';
char *ret;
ret = strchr(str, ch);
printf("%s\n", ret);
printf("String after |%c| is - |%s|\n", ch, ret);
return(0);
}

The Problem
The instruction immediately following the good label:
mov rax, [rdi + rcx]
should actually be:
lea rax, [rdi + rax]
You weren't using rcx at all, but rax and, what you need is the address of that position, not the value at that position (i.e. lea instead of mov).
Some Advice
Note that the typical idiom for comparing sil against zero is actually test sil, sil instead of cmp sil, 0. It would be then:
test sil, sil
jz end
However, if we look at the strchr(3) man page, we can find the following:
char *strchr(const char *s, int c);
The terminating
null byte is considered part of the string, so that if c is specified as '\0', these functions return a pointer to the terminator.
So, if we want this strchr() implementation to behave as described in the man page, the following code must be removed:
cmp sil, 0
jz end
The typical zeroing idiom for the rax register is neither mov rax, 0 nor xor rax, rax, but rather xor eax, eax, since it doesn't have the encode the immediate zero and saves one byte respect to the latter.
With the correction and the advice above, the code would look like the following:
section .text
global strchr
strchr:
xor eax, eax
loop:
; Is end of string?
cmp BYTE [rdi + rax], 0
jz end
; Is matched?
cmp BYTE [rdi + rax], sil
jz good
inc rax
jmp loop
good:
lea rax, [rdi + rax]
ret
end:
xor eax, eax
ret

Related

Recursion in Assembly x86: Fibonacci

I am trying to code a recursive fibonacci sequence in assembly, but it is not working for some reason.
It does not give an error, but the output number is always wrong.
section .bss
_bss_start:
; store max iterations
max_iterations resb 4
_bss_end:
section .text
global _start
; initialise the function variables
_start:
mov dword [max_iterations], 11
mov edx, 0
push 0
push 1
jmp fib
fib:
; clear registers
mov eax, 0
mov ebx, 0
mov ecx, 0
; handle fibonacci math
pop ebx
pop eax
add ecx, eax
add ecx, ebx
push eax
push ebx
push ecx
; incriment counter / exit contitions
inc edx
cmp edx, [max_iterations]
je print
; recursive step
call fib
ret
print:
mov eax, 1
pop ebx
int 0x80
For instance, the above code prints a value of 79 rather than 144 (11th fibonacci number).
Alternatively, if I make
mov dword [max_iterations], 4
Then the above code prints 178 rather than 5 (5th fibonacci number).
Any one have an idea?
K
As an approach, you should try to debug it with the smallest possible input, like 1 iteration.  That will be most revealing as you can watch it do the wrong thing in great detail without worrying about multiple recursing's.  When that works, go to 2 iterations.
When you use complex addressing modes, it is harder to debug as we cannot see what the processor is doing.  So, when an instruction using a complex addressing mode doesn't work, and you want to debug it, then split that instruction into 2 instructions as follows:
mov dword [fibonacci_seq + edx + 4], ecx
---
lea esi, [fibonacci_seq + edx + 4]
mov [esi], ecx
With the alternate code sequence, you can observe the value of the addressing mode computation, which will provide you with additional debugging insight.
As another example:
cmp edx, [max_iterations]
---
mov edi, [max_iterations]
cmp edx, edi
Using the 2 instruction version, you will be able to see what value the processor is comparing edx with.
Or better, do that that mov load once before the loop, so you're keeping the loop bound in a register all the time. That's what you should normally do when you have enough registers, only using memory when you run out.
You are jmping to fib from one place in the code and calling it from another.  Though your logic should work because when you've reached the limit, you don't return to the main, this is really bad form: to mix main code with function.  More on that below...
mov dword [fibonacci_seq + edx + 4], ecx
Is this working for you? You're only incrementing edx by 1.  Perhaps you wanted:
mov dword [fibonacci_seq + edx * 4], ecx
I would argue that your code is not really recursive.
call fib ; jumps to fib, pushes a return address
ret ; never, ever reached, so, pointless
---
jmp fib ; iterate w/o pushing unwanted return address onto the stack
The 1-instruction jmp will be superior to the call as a mechanism to iterate, in part b/c it doesn't push an unnecessary return address onto the stack.
When you debug with 2 iterations, you'll probably see that the unused return address pushed by the call messes up your "parameter" passing, pops.
To expand on the "recursion", when the iteration stops and control transfers to print, there will be some 11 (depending on iteration count) unused return addresses on the stack (modulo the interference by the pop's and pushes).
The recursive call is only used for iteration, the recursion never unwinds.  Thus, I would argue it's not recursive (not even tail recursive) — it just erroneously pushes some unused return addresses onto the stack — that's not recursion.
This line is your main problem:
mov dword [fibonacci_seq + edx + 4], ecx
Because of the +4, you never write to the first entry of the "array". And because you only increment EDX by 1, each write to the array overwrites 3 bytes of the previous entry. Try this instead:
mov dword [fibonacci_seq + edx * 4], ecx
A bit of redesign, as I did not realise that the call instruction used the stack in this way, and the solution is here
section .bss
_bss_start:
; store max iterations and current iteration
max_iterations resb 4
iteration resb 4
; store arguments
n_0 resb 4
n_1 resb 4
_bss_end:
section .text
global _start
; initialise the function variables
_start:
mov dword [max_iterations], 11
mov dword [iteration], 0
mov dword [n_0], 0
mov dword [n_1], 1
jmp fib
fib:
mov ecx, 0
mov edx, 0
mov eax, [n_0]
mov ebx, [n_1]
add ecx, eax
add ecx, ebx
mov edx, [n_1]
mov dword [n_0], edx
mov dword [n_1], ecx
mov edx, [iteration]
inc edx
mov dword [iteration], edx
cmp edx, [max_iterations]
je print
call fib
ret
print:
mov eax, 1
mov ebx, [n_1]
int 0x80

How to print LOCAL byte with WinApi's WriteConsole

It's hard for me to clarify my question, but I'll try. I'm trying to learn MASM32 and I have a task to print some text in console without using .data or .const. The problem is that LOCAL puts variable on stack, but not in static memory. So i cant get their address (offset), and WriteConsole uses a pointer to text's address in memory. Any thoughts on how to deal with this problem? Thanks!
I have this:
.data
string db 10 'somestring'
.code
WriteToConsole PROC
LOCAL handle :DWORD
invoke GetStdHandle, -11
mov handle, eax
mov edx, offset string
invoke WriteConsoleA, handle, edx, 10, 0, 0
xor eax, eax
ret
WriteToConsole ENDP
And I want something like that:
.code
WriteToConsole PROC
LOCAL string[10] :SBYTE
LOCAL handle :DWORD
invoke GetStdHandle, -11
mov handle, eax
mov edx, offset string ;impossible because of stack
invoke WriteConsoleA, handle, edx, 10, 0, 0 ;can't call without a pointer
xor eax, eax
ret
WriteToConsole ENDP```
Well, I found an answer:
LOCAL string[10] :DWORD
lea edx, string
invoke WriteConsoleA, handle, edx, stringlength, 0, 0
Loading effective address instead of an offset helps!

When is tail recursion guaranteed in Rust?

C language
In the C programming language, it's easy to have tail recursion:
int foo(...) {
return foo(...);
}
Just return as is the return value of the recursive call. It is especially important when this recursion may repeat a thousand or even a million times. It would use a lot of memory on the stack.
Rust
Now, I have a Rust function that might recursively call itself a million times:
fn read_all(input: &mut dyn std::io::Read) -> std::io::Result<()> {
match input.read(&mut [0u8]) {
Ok ( 0) => Ok(()),
Ok ( _) => read_all(input),
Err(err) => Err(err),
}
}
(this is a minimal example, the real one is more complex, but it captures the main idea)
Here, the return value of the recursive call is returned as is, but:
Does it guarantee that the Rust compiler will apply a tail recursion?
For instance, if we declare some variable that needs to be destroyed like a std::Vec, will it be destroyed just before the recursive call (which allows for tail recursion) or after the recursive call returns (which forbids the tail recursion)?
Shepmaster's answer explains that tail call elimination is merely an optimization, not a guarantee, in Rust. But "never guaranteed" doesn't mean "never happens". Let's take a look at what the compiler does with some real code.
Does it happen in this function?
As of right now, the latest release of Rust available on Compiler Explorer is 1.39, and it does not eliminate the tail call in read_all.
example::read_all:
push r15
push r14
push rbx
sub rsp, 32
mov r14, rdx
mov r15, rsi
mov rbx, rdi
mov byte ptr [rsp + 7], 0
lea rdi, [rsp + 8]
lea rdx, [rsp + 7]
mov ecx, 1
call qword ptr [r14 + 24]
cmp qword ptr [rsp + 8], 1
jne .LBB3_1
movups xmm0, xmmword ptr [rsp + 16]
movups xmmword ptr [rbx], xmm0
jmp .LBB3_3
.LBB3_1:
cmp qword ptr [rsp + 16], 0
je .LBB3_2
mov rdi, rbx
mov rsi, r15
mov rdx, r14
call qword ptr [rip + example::read_all#GOTPCREL]
jmp .LBB3_3
.LBB3_2:
mov byte ptr [rbx], 3
.LBB3_3:
mov rax, rbx
add rsp, 32
pop rbx
pop r14
pop r15
ret
mov rbx, rax
lea rdi, [rsp + 8]
call core::ptr::real_drop_in_place
mov rdi, rbx
call _Unwind_Resume#PLT
ud2
Notice this line: call qword ptr [rip + example::read_all#GOTPCREL]. That's the (tail) recursive call. As you can tell from its existence, it was not eliminated.
Compare this to an equivalent function with an explicit loop:
pub fn read_all(input: &mut dyn std::io::Read) -> std::io::Result<()> {
loop {
match input.read(&mut [0u8]) {
Ok ( 0) => return Ok(()),
Ok ( _) => continue,
Err(err) => return Err(err),
}
}
}
which has no tail call to eliminate, and therefore compiles to a function with only one call in it (to the computed address of input.read).
Oh well. Maybe Rust isn't as good as C. Or is it?
Does it happen in C?
Here's a tail-recursive function in C that performs a very similar task:
int read_all(FILE *input) {
char buf[] = {0, 0};
if (!fgets(buf, sizeof buf, input))
return feof(input);
return read_all(input);
}
This should be super easy for the compiler to eliminate. The recursive call is right at the bottom of the function and C doesn't have to worry about running destructors. But nevertheless, there's that recursive tail call, annoyingly not eliminated:
call read_all
Tail call optimization is not guaranteed to happen in C, either. No compiler I tried would be convinced to turn this into a loop on its own initiative.
Since version 13, clang supports a non-standard musttail attribute you can add to tail calls that should be eliminated. Adding this attribute to the C code successfully eliminates the tail call. However, rustc currently has no equivalent attribute (although the become keyword is reserved for this purpose).
Does it ever happen in Rust?
Okay, so it's not guaranteed. Can the compiler do it at all? Yes! Here's a function that computes Fibonacci numbers via a tail-recursive inner function:
pub fn fibonacci(n: u64) -> u64 {
fn f(n: u64, a: u64, b: u64) -> u64 {
match n {
0 => a,
_ => f(n - 1, a + b, a),
}
}
f(n, 0, 1)
}
Not only is the tail call eliminated, the whole fibonacci_lr function is inlined into fibonacci, yielding only 12 instructions (and not a call in sight):
example::fibonacci:
push 1
pop rdx
xor ecx, ecx
.LBB0_1:
mov rax, rdx
test rdi, rdi
je .LBB0_3
dec rdi
add rcx, rax
mov rdx, rcx
mov rcx, rax
jmp .LBB0_1
.LBB0_3:
ret
If you compare this to an equivalent while loop, the compiler generates almost the same assembly.
What's the point?
You probably shouldn't be relying on optimizations to eliminate tail calls, either in Rust or in C. It's nice when it happens, but if you need to be sure that a function compiles into a tight loop, the surest way, at least for now, is to use a loop.
Neither tail recursion (reusing a stack frame for a tail call to the same function) nor tail call optimization (reusing the stack frame for a tail call to any function) are ever guaranteed by Rust, although the optimizer may choose to perform them.
if we declare some variable that needs to be destroyed
It's my understanding that this is one of the sticking points, as changing the location of destroyed stack variables would be contentious.
See also:
Recursive function calculating factorials leads to stack overflow
RFC 81: guaranteed tail call elimination
RFC 1888: Proper tail calls

Nasm - access struct elements by value and by address

I started to code in NASM assembly lately and my problem is that I don't know how I access struct elements the right way. I already searched for solutions on this site and on google but everywhere I look people say different things. My program is crashing and I have the feeling the problem lies in accessing the structs.
When looking at the example code:
STRUC Test
.normalValue RESD 1
.address RESD 1
ENDSTRUC
TestStruct:
istruc Test
at Test.normalValue dd ffff0000h
at Test.address dd 01234567h
iend
;Example:
mov eax, TestStruct ; moves pointer to first element to eax
mov eax, [TestStruct] ; moves content of the dereferenced pointer to eax (same as mov eax, ffff0000h)
mov eax, TestStruct
add eax, 4
mov ebx, eax ; moves pointer to the second element (4 because RESD 1)
mov eax, [TestStruct+4] ; moves content of the dereferenced pointer to eax (same as mov eax, 01234567h)
mov ebx, [eax] ; moves content at the address 01234567h to ebx
Is that right?
Help is appreciated
I dont know if you figured out but here is our code with some little modification that works. All instructions are correct except the last one mov ebx, [eax] which is expected caus you are trying to access content at address 0x1234567 resulting in SIGSEGV
section .bss
struc Test
normalValue RESD 1
address RESD 1
endstruc
section .data
TestStruct:
istruc Test
at normalValue, dd 0xffff0000
at address, dd 0x01234567
iend
section .text
global _start
_start:
mov eax, TestStruct ; moves pointer to first element to eax
mov eax, [TestStruct] ; moves content of the dereferenced pointer to eax same as mov eax, ffff0000h
mov eax, TestStruct
add eax, 4
mov ebx, eax ; moves pointer to the second element 4 because RESD 1
mov eax, [TestStruct+4] ; moves content of the dereferenced pointer to eax same as mov eax, 01234567h
mov ebx, [eax] ; moves content at the address 01234567h to ebx
Compile, link and run step by step with nasm -f elf64 main.nasm -o main.o; ld main.o -o main; gdb main

print a negative number in NASM

I have a problem with my function in assembly language (NASM).
I want to know why, when I enter a negative number, my function prints an unsigned int.
I'm currently working on mac OSX.
I wonder if someone could explain the error to me.
My prototype :
;void ft_putnbr(int n);
When I enter a negative value in my function :
-1 my function print 4294954951.
section .text
global _ft_putnbr
_ft_putnbr :
push rbp
mov rbp, rsp
xor rbx, rbx
mov rbx, rdi
cmp rbx, 0x0
jge _printnb
neg rbx
mov rax, SYS_WRITE
mov rdi, 1
mov [rsi], byte 45
mov rdx, 1
syscall
_printnb :
cmp rbx, 0x9
jg _recursion
mov [rsi], byte 48
add [rsi], rbx
mov rdi, 1
mov rax, SYS_WRITE
mov rdx, 1
syscall
jmp _return
_recursion :
xor rdx, rdx
xor rax, rax
mov rcx, 10
mov rax, rbx
div rcx
push rdx
mov rdi, rax
call _ft_putnbr
pop rdi
call _ft_putnbr
jmp _return
_return :
leave
ret
int is 32 bits, but you use 64 bit arithmetic. A simple but ugly fix is to sign extend it to 64 bits so you don't have to change the rest of the code: instead of mov rbx, rdi do movsx rbx, edi. Alternatively, change the function prototype to use a 64 bit type, probably long.
Also note that you use rsi without initialization, and that's very bad.
PS: learn to use a debugger.

Resources