Assembly early return on a recursive function - recursion

This is more an academic exercise than anything else, but I'm looking to write a recursive function in assembly, that, if it receives and "interrupt signal" it returns to the main function, and not just the function that invoked it (which is usually the same recursive function).
For this test, I'm doing a basic countdown and printing one-character digits (8...7...6...etc.). To simulate an "interrupt", I am using the number 7, so when the function hits 7 (if it starts above that), it will return a 1 meaning it was interrupted, and if it wasn't interrupted, it'll countdown to zero. Here is what I have thus far:
.globl _start
_start:
# countdown(9);
mov $8, %rdi
call countdown
# return 0;
mov %eax, %edi
mov $60, %eax
syscall
print:
push %rbp
mov %rsp, %rbp
# write the value to a memory location
pushq %rdi # now 16-byte aligned
add $'0', -8(%rbp)
movb $'\n', -7(%rbp)
# do a write syscall
mov $1, %rax # linux syscall write
mov $1, %rdi # file descriptor: stdout=1
lea -8(%rbp), %rsi # memory location of string goes in rsi
mov $2, %rdx # length: 1 char + newline
syscall
# restore the stack
pop %rdi
pop %rbp
ret;
countdown:
# this is the handler to call the recursive function so it can
# pass the address to jump back to in an interrupt as one of the
# function parameters
# (%rsp) currntly holds the return address, and let's pass that as the second argument
mov %rdi, %rdi # redundant, but for clarity
mov (%rsp), %rsi # return address to jump
call countdown_recursive
countdown_recursive:
# bool countdown(int n: n<10, return_address)
# ...{
push %rbp
mov %rsp, %rbp
# if (num<0) ... return
cmp $0, %rdi
jz end
# imaginary interrupt on num=7
cmp $7, %rdi
jz fast_ret
# else...printf("%d\n", num);
push %rsi
push %rdi
call print
pop %rdi
pop %rsi
# --num
dec %rdi
# countdown(num)
call countdown_recursive
end:
# ...}
mov $0, %eax
mov %rbp, %rsp
pop %rbp
ret
fast_ret:
mov $1, %eax
jmp *%rsi
Does the above look like a valid approach, passing the memory address I want to go back to in rsi? The function was incredibly tricky for me to write, but I think mainly due to the fact that I'm pretty new/raw with assembly.

As well as returning to this alternate return address, you also need to restore the caller's (call-preserved) registers, not just the ones of your most recent parent. That includes RSP.
You're basically trying to re-invent C's setjmp / longjmp which does exactly this, including resetting the stack pointer back to the scope where you called setjmp. I think a few of the questions in SO's setjmp tag are about about implementing your own setjmp / longjmp in asm.
Also, to make this more efficient you might want to use a custom calling convention where the return address pointer (or a jmpbuf pointer after implementing the above) is in a call-preserved register like R15, so you don't have to save/restore it around print calls inside the body of your recursive function.

Related

Recursion in Assembly x86: Fibonacci

I am trying to code a recursive fibonacci sequence in assembly, but it is not working for some reason.
It does not give an error, but the output number is always wrong.
section .bss
_bss_start:
; store max iterations
max_iterations resb 4
_bss_end:
section .text
global _start
; initialise the function variables
_start:
mov dword [max_iterations], 11
mov edx, 0
push 0
push 1
jmp fib
fib:
; clear registers
mov eax, 0
mov ebx, 0
mov ecx, 0
; handle fibonacci math
pop ebx
pop eax
add ecx, eax
add ecx, ebx
push eax
push ebx
push ecx
; incriment counter / exit contitions
inc edx
cmp edx, [max_iterations]
je print
; recursive step
call fib
ret
print:
mov eax, 1
pop ebx
int 0x80
For instance, the above code prints a value of 79 rather than 144 (11th fibonacci number).
Alternatively, if I make
mov dword [max_iterations], 4
Then the above code prints 178 rather than 5 (5th fibonacci number).
Any one have an idea?
K
As an approach, you should try to debug it with the smallest possible input, like 1 iteration.  That will be most revealing as you can watch it do the wrong thing in great detail without worrying about multiple recursing's.  When that works, go to 2 iterations.
When you use complex addressing modes, it is harder to debug as we cannot see what the processor is doing.  So, when an instruction using a complex addressing mode doesn't work, and you want to debug it, then split that instruction into 2 instructions as follows:
mov dword [fibonacci_seq + edx + 4], ecx
---
lea esi, [fibonacci_seq + edx + 4]
mov [esi], ecx
With the alternate code sequence, you can observe the value of the addressing mode computation, which will provide you with additional debugging insight.
As another example:
cmp edx, [max_iterations]
---
mov edi, [max_iterations]
cmp edx, edi
Using the 2 instruction version, you will be able to see what value the processor is comparing edx with.
Or better, do that that mov load once before the loop, so you're keeping the loop bound in a register all the time. That's what you should normally do when you have enough registers, only using memory when you run out.
You are jmping to fib from one place in the code and calling it from another.  Though your logic should work because when you've reached the limit, you don't return to the main, this is really bad form: to mix main code with function.  More on that below...
mov dword [fibonacci_seq + edx + 4], ecx
Is this working for you? You're only incrementing edx by 1.  Perhaps you wanted:
mov dword [fibonacci_seq + edx * 4], ecx
I would argue that your code is not really recursive.
call fib ; jumps to fib, pushes a return address
ret ; never, ever reached, so, pointless
---
jmp fib ; iterate w/o pushing unwanted return address onto the stack
The 1-instruction jmp will be superior to the call as a mechanism to iterate, in part b/c it doesn't push an unnecessary return address onto the stack.
When you debug with 2 iterations, you'll probably see that the unused return address pushed by the call messes up your "parameter" passing, pops.
To expand on the "recursion", when the iteration stops and control transfers to print, there will be some 11 (depending on iteration count) unused return addresses on the stack (modulo the interference by the pop's and pushes).
The recursive call is only used for iteration, the recursion never unwinds.  Thus, I would argue it's not recursive (not even tail recursive) — it just erroneously pushes some unused return addresses onto the stack — that's not recursion.
This line is your main problem:
mov dword [fibonacci_seq + edx + 4], ecx
Because of the +4, you never write to the first entry of the "array". And because you only increment EDX by 1, each write to the array overwrites 3 bytes of the previous entry. Try this instead:
mov dword [fibonacci_seq + edx * 4], ecx
A bit of redesign, as I did not realise that the call instruction used the stack in this way, and the solution is here
section .bss
_bss_start:
; store max iterations and current iteration
max_iterations resb 4
iteration resb 4
; store arguments
n_0 resb 4
n_1 resb 4
_bss_end:
section .text
global _start
; initialise the function variables
_start:
mov dword [max_iterations], 11
mov dword [iteration], 0
mov dword [n_0], 0
mov dword [n_1], 1
jmp fib
fib:
mov ecx, 0
mov edx, 0
mov eax, [n_0]
mov ebx, [n_1]
add ecx, eax
add ecx, ebx
mov edx, [n_1]
mov dword [n_0], edx
mov dword [n_1], ecx
mov edx, [iteration]
inc edx
mov dword [iteration], edx
cmp edx, [max_iterations]
je print
call fib
ret
print:
mov eax, 1
mov ebx, [n_1]
int 0x80

When is tail recursion guaranteed in Rust?

C language
In the C programming language, it's easy to have tail recursion:
int foo(...) {
return foo(...);
}
Just return as is the return value of the recursive call. It is especially important when this recursion may repeat a thousand or even a million times. It would use a lot of memory on the stack.
Rust
Now, I have a Rust function that might recursively call itself a million times:
fn read_all(input: &mut dyn std::io::Read) -> std::io::Result<()> {
match input.read(&mut [0u8]) {
Ok ( 0) => Ok(()),
Ok ( _) => read_all(input),
Err(err) => Err(err),
}
}
(this is a minimal example, the real one is more complex, but it captures the main idea)
Here, the return value of the recursive call is returned as is, but:
Does it guarantee that the Rust compiler will apply a tail recursion?
For instance, if we declare some variable that needs to be destroyed like a std::Vec, will it be destroyed just before the recursive call (which allows for tail recursion) or after the recursive call returns (which forbids the tail recursion)?
Shepmaster's answer explains that tail call elimination is merely an optimization, not a guarantee, in Rust. But "never guaranteed" doesn't mean "never happens". Let's take a look at what the compiler does with some real code.
Does it happen in this function?
As of right now, the latest release of Rust available on Compiler Explorer is 1.39, and it does not eliminate the tail call in read_all.
example::read_all:
push r15
push r14
push rbx
sub rsp, 32
mov r14, rdx
mov r15, rsi
mov rbx, rdi
mov byte ptr [rsp + 7], 0
lea rdi, [rsp + 8]
lea rdx, [rsp + 7]
mov ecx, 1
call qword ptr [r14 + 24]
cmp qword ptr [rsp + 8], 1
jne .LBB3_1
movups xmm0, xmmword ptr [rsp + 16]
movups xmmword ptr [rbx], xmm0
jmp .LBB3_3
.LBB3_1:
cmp qword ptr [rsp + 16], 0
je .LBB3_2
mov rdi, rbx
mov rsi, r15
mov rdx, r14
call qword ptr [rip + example::read_all#GOTPCREL]
jmp .LBB3_3
.LBB3_2:
mov byte ptr [rbx], 3
.LBB3_3:
mov rax, rbx
add rsp, 32
pop rbx
pop r14
pop r15
ret
mov rbx, rax
lea rdi, [rsp + 8]
call core::ptr::real_drop_in_place
mov rdi, rbx
call _Unwind_Resume#PLT
ud2
Notice this line: call qword ptr [rip + example::read_all#GOTPCREL]. That's the (tail) recursive call. As you can tell from its existence, it was not eliminated.
Compare this to an equivalent function with an explicit loop:
pub fn read_all(input: &mut dyn std::io::Read) -> std::io::Result<()> {
loop {
match input.read(&mut [0u8]) {
Ok ( 0) => return Ok(()),
Ok ( _) => continue,
Err(err) => return Err(err),
}
}
}
which has no tail call to eliminate, and therefore compiles to a function with only one call in it (to the computed address of input.read).
Oh well. Maybe Rust isn't as good as C. Or is it?
Does it happen in C?
Here's a tail-recursive function in C that performs a very similar task:
int read_all(FILE *input) {
char buf[] = {0, 0};
if (!fgets(buf, sizeof buf, input))
return feof(input);
return read_all(input);
}
This should be super easy for the compiler to eliminate. The recursive call is right at the bottom of the function and C doesn't have to worry about running destructors. But nevertheless, there's that recursive tail call, annoyingly not eliminated:
call read_all
Tail call optimization is not guaranteed to happen in C, either. No compiler I tried would be convinced to turn this into a loop on its own initiative.
Since version 13, clang supports a non-standard musttail attribute you can add to tail calls that should be eliminated. Adding this attribute to the C code successfully eliminates the tail call. However, rustc currently has no equivalent attribute (although the become keyword is reserved for this purpose).
Does it ever happen in Rust?
Okay, so it's not guaranteed. Can the compiler do it at all? Yes! Here's a function that computes Fibonacci numbers via a tail-recursive inner function:
pub fn fibonacci(n: u64) -> u64 {
fn f(n: u64, a: u64, b: u64) -> u64 {
match n {
0 => a,
_ => f(n - 1, a + b, a),
}
}
f(n, 0, 1)
}
Not only is the tail call eliminated, the whole fibonacci_lr function is inlined into fibonacci, yielding only 12 instructions (and not a call in sight):
example::fibonacci:
push 1
pop rdx
xor ecx, ecx
.LBB0_1:
mov rax, rdx
test rdi, rdi
je .LBB0_3
dec rdi
add rcx, rax
mov rdx, rcx
mov rcx, rax
jmp .LBB0_1
.LBB0_3:
ret
If you compare this to an equivalent while loop, the compiler generates almost the same assembly.
What's the point?
You probably shouldn't be relying on optimizations to eliminate tail calls, either in Rust or in C. It's nice when it happens, but if you need to be sure that a function compiles into a tight loop, the surest way, at least for now, is to use a loop.
Neither tail recursion (reusing a stack frame for a tail call to the same function) nor tail call optimization (reusing the stack frame for a tail call to any function) are ever guaranteed by Rust, although the optimizer may choose to perform them.
if we declare some variable that needs to be destroyed
It's my understanding that this is one of the sticking points, as changing the location of destroyed stack variables would be contentious.
See also:
Recursive function calculating factorials leads to stack overflow
RFC 81: guaranteed tail call elimination
RFC 1888: Proper tail calls

Nasm - access struct elements by value and by address

I started to code in NASM assembly lately and my problem is that I don't know how I access struct elements the right way. I already searched for solutions on this site and on google but everywhere I look people say different things. My program is crashing and I have the feeling the problem lies in accessing the structs.
When looking at the example code:
STRUC Test
.normalValue RESD 1
.address RESD 1
ENDSTRUC
TestStruct:
istruc Test
at Test.normalValue dd ffff0000h
at Test.address dd 01234567h
iend
;Example:
mov eax, TestStruct ; moves pointer to first element to eax
mov eax, [TestStruct] ; moves content of the dereferenced pointer to eax (same as mov eax, ffff0000h)
mov eax, TestStruct
add eax, 4
mov ebx, eax ; moves pointer to the second element (4 because RESD 1)
mov eax, [TestStruct+4] ; moves content of the dereferenced pointer to eax (same as mov eax, 01234567h)
mov ebx, [eax] ; moves content at the address 01234567h to ebx
Is that right?
Help is appreciated
I dont know if you figured out but here is our code with some little modification that works. All instructions are correct except the last one mov ebx, [eax] which is expected caus you are trying to access content at address 0x1234567 resulting in SIGSEGV
section .bss
struc Test
normalValue RESD 1
address RESD 1
endstruc
section .data
TestStruct:
istruc Test
at normalValue, dd 0xffff0000
at address, dd 0x01234567
iend
section .text
global _start
_start:
mov eax, TestStruct ; moves pointer to first element to eax
mov eax, [TestStruct] ; moves content of the dereferenced pointer to eax same as mov eax, ffff0000h
mov eax, TestStruct
add eax, 4
mov ebx, eax ; moves pointer to the second element 4 because RESD 1
mov eax, [TestStruct+4] ; moves content of the dereferenced pointer to eax same as mov eax, 01234567h
mov ebx, [eax] ; moves content at the address 01234567h to ebx
Compile, link and run step by step with nasm -f elf64 main.nasm -o main.o; ld main.o -o main; gdb main

How does register % rbx hold the previous value in a Recursive call

The C and assembly code:
long pcount_r(unsigned long x) {
if (x == 0) return 0;
else return (x & 1) + pcount_r (x>>1);
}
pcoutn_r:
movl $0, %eax
testq %rdi, %rdi
je .L6
pushq %rbx
movq %rdi, %rbx
andl $1, %ebx
shrq %rdi
call pcount_r
addq %rbx, %rax
popq %rbx
.L6:
rep; ret
If passing function a value x = 5 (binary representation is 0101). Register %rdi is the x value, and %rbx holds x value by "movq %rdi, %rbx" before pass "x>>1" to the next recursive call. Therefore, %rbx holds 0101, then 010, then 01.
The first call of pcount_r:
%rdi %rbx
0101 --> 0101
The second call of pcount_r:
%rdi %rbx
010 --> 010
The third call and etc...
%rbx can only hold one value at the time, To me, it seems just overwriting the previous value of %rbx, rather than saving data on a stack. My question is: how can %rbx restore the previous %rbx value when a recursive call ends?
more details about this function can be found in this link.
Recursive Procedures at time 55:00
I see. My mistake is that I thought register %rbx is holding the x value. But In fact, %rbx is just holding x value temporarily, assembly code "pushq %rbx" pushes %rbx value to the stack before passing the next %rdi to %rbx. Similarly, the value is restored by "popq %rbx".

x86 pointers in commands

I'm new to x86.
I know what this kind of thing with the pointers means.
*command* %eax, %ebx
But how are these different, and what do they mean?
*command* %eax, (%ebx)
*command* (%eax), %ebx
*command* (%eax, %ebx, 4), %ecx
I think your question is, "what does the parentheses around a register's name mean/do?" At a high level, the parentheses say to perform a load from a memory address in and use that value in the instruction. Ie, whereas
*command* %eax, %ebx
operates on the values in the %eax and %ebx registers directly,
*command* (%eax), (%ebx)
loads the values from memory pointed to by %eax and %ebx and operates on them. There are actually a few more variants of the parentheses than you listed. For a description of them (including the last instruction example that you asked about), check here.
Hope that helps. Feel free to post back if you have any more questions.
Assume the following operations:
movl %eax, (%ebx) [1]
movl (%eax), %ebx [2]
movl (%eax, %ebx, 4), %ecx [3]
1, The first one will copy the value of eax into an address stored in ebx, smiler to this in C:
*(int *)ebx = eax; // copy eax into address
2, The second will copy the value stored in an address at eax into ebx:
ebx = *(int *)eax; // copy what in address into ebx
3, This is an array operation, where ebx is the index and 4 is the size of an element of the array.
ecx = ((int *) p)[ebx];
calculated as:
ecx = *(int *)((char *)p + ebx * sizeof(int));
In AT&T asm syntax, parenthesis mean "dereference" -- roughly the same as the unary * operator in C. So some rough equivalences:
movl %eax, %ebx eax = ebx
movl %eax, (%ebx) eax = *ebx
movl (%eax), %ebx *eax = ebx
That leqaves your last example:
movl (%eax, %ebx, 4), %ecx
In this case, there are multiple values that are combined to form the address to dereference. It's roughly equivalent to
*(eax + ebx*4) = ecx

Resources