How is a JUMP and CALL instruction different? How does it relate to the higher level concepts such as a GOTO or a procedure call? (Am I correct in the comparison?)
This is what I think:
JUMP or GOTO is a transfer of the control to another location and the control does not automatically return to the point from where it is called.
On the other hand, a CALL or procedure/function call returns to the point from where it is called. Due to this difference in their nature, languages typically make use of a stack and a stack frame is pushed to "remember" the location to come back for each procedure called. This behaviour applies to recursive procedures too. In case of tail recursion, there is however no need to "push" a stack frame for each call.
Your answers and comments will be much appreciated.
You're mostly right, if you are talking about CALL/JMP in x86 assembly or something similar. The main difference is:
JMP performs a jump to a location, without doing anything else
CALL pushes the current instruction pointer on the stack (rather: one after the current instruction), and then JMPs to the location. With a RET you can get back to where you were.
Usually, CALL is just a convenience function implemented using JMP. You could do something like
movl $afterJmp, -(%esp)
jmp location
afterJmp:
instead of a CALL.
You're exactly right about the difference between a jump and a call.
In the sample case of a single function with tail recursion, then the compiler may be able to reuse the existing stack frame. However, it can become more complicated with mutually recursive functions:
void ping() { printf("ping\n"); pong(); }
void pong() { printf("pong\n"); ping(); }
Consider the case where ping() and pong() are more complex functions that take different numbers of parameters. Mark Probst's paper talks about tail recursion implementation for GCC in great detail.
I think you've got the general idea.
It depends on the architecture, but in general, at the hardware level:
A jump instruction will change the program counter to continue execution at a different part of the program.
A call instruction will push the current program location (or current location + 1) to the call stack and jump to another part of a program. A return instruction will then pop the location off of the call stack and jump back to the original location (or orignal location + 1).
So, a jump instruction is close to a GOTO, while a call instruction is close to a procedural/function call.
Also, for the reason that a call stack is used when making function calls, pushing too many return addresses to the call stack by recursion will cause a stack overflow.
When learning assembly, I find it easier when dealing with RISC processors than x86 processors, as it tends to have less instruction and simpler operations.
One correction to your thoughts: It is not only with tail recursion, but generally with tail calls, that we don't need the stack frame and hence simply could JMP there (provided the arguments have been set up correctly).
According to microprocessor, firstly the condition is checked and then it performs jump operations (goes to other code) and does not return.
Call operation is like a function callig in the c language and when the function is performed it returns back to complete its execution.
Related
We can print a linked list in reverse order with a stack as well as using recursion. My teacher said that using explicit stack is better since recursion also uses stack but has to maintain a lot of other parameters. Even if we use std::stack from stack, doesn't referring to an external library also take up time? How does using an explicit stack save time/space compared to using a recursive solution?
Recursion involves the use of implicit stacks. This is implemented in the background by the compiler being used to compile your code.
This background stack created by the compiler is known as a ‘Call stack’. Call Stack can be implemented using stack data structure which stores information about the active subroutines of a computer program.
Each subroutine call uses a frame inside the call stack called the stack frame. When a function returns a value, it’s stack frame is popped off the call stack.
Recursion's Call Stack vs Explicit Call Stack?
Stack overflow
The fundamental difference between the 2 stacks is that the space allocated by the compiler for the call stack of a program is fixed. This means there will be a stack overflow if you’re not sure about the maximum no. of recursive function calls expected and there are way too many calls than the space allocated to the stack can handle at a given point of time.
On the other hand, if you define an explicit stack, it’s implemented on the heap space allocated to the program by the compiler at run time. And guess what, the heap size is not fixed and can increase dynamically during run-time when required. You don’t really have to worry about the explicit stack overflowing.
Space and Time
Which one will be faster for a given situation?
Iterating on an explicit stack can be faster than recursion in languages that don’t support recursion related optimizations such as tail call optimization for tail recursion.
What’s Tail recursion?
Tail recursion is a special case of recursion where the recursive function doesn’t do any more computation after the recursive function call i.e. the last step of the function is a call to the recursive function.
What’s Tail-call optimization (TCO)?
Tail-call optimization is where you are able to avoid allocating a new stack frame for a function because the calling function will simply return the value that it gets from the called function.
So, compilers/languages that support tail-call optimizations implement the call to the recursive function with only a single stack frame in the call stack. If your compiler/language doesn’t support this, then using an explicit stack will save you a LOT of space and time.
Python doesn’t support tail call optimization. The main reason for this is to have a complete and clear stack trace which enables efficient debugging. Almost all C/C++ compilers support tail call optimization.
Sometimes explicitly controlling the stack helps simplify things when multiple parameters are being used.
whereas, a recursive solution makes the size of the source code a lot smaller and more maintainable.
Conclusion
In the end, There’s no fixed answer. For a particular scenario, many factors need to be considered such as scalability, code maintainability, language/compiler being used, etc.
The best way would be to implement the solution using both ways, time the 2 solutions on an input set and analyze peak space utilization before deploying it on a production setup.
See Wikipedia - Recursion versus Iteration
I am currently taking the course of POSIX thread library using the Solaris operating system as an example and i got this question from my university teacher :
What exactly happens when a child thread meets on its way a return statement? (I answered that in this case an implicit call to pthread_exit is made, but my answer did not satisfy him, so he asked another one which is below).
What happens if this function is not this thread's start function, i.e. how does the OC khow on which return statement it is obliged to return control (to make an implicit call to pthread_exit) and on which not?
I guess I can paraphrase this as : "What exactly happens when a thread stumbles over a return statement?"
I don't know how to answer this question, so any help would be really appreciated!
P.S.
In fact when I looked through the assembly code of my program I could see that no implicit call to pthread_exit was made:
void* print_some_lines(void* args) {
printf("Child printing..\n");
int i;
for (i = 0; i < 10; ++i) {
printf("Child's line number %d \n", i);
}
printf("Child done printing\n");
return NULL;
}
And
return NULL;
}
equals
mov eax, 0
leave
ret
in assembly
What exactly happens when a child thread meets on its way a return statement? (I answered that in this case an implicit call to pthread_exit is made, but my answer did not satisfy him, so he asked another one which is below).
As I wrote in comments, the Pthreads specification does not speak to the question, but after some consideration, I am inclined to say that the only plausible answers are all along the lines that the function returns control to its caller, just like any other time a return is executed. My take, then, is that you are overthinking the question, and that's why you're having trouble finding an answer.
Note well that Pthreads does not distinguish "child threads" as a category -- once started, threads are all on equal ground. Even the program's initial thread. For example, any thread can join any other joinable thread, including the program's initial thread if that one happens to terminate by calling pthread_exit(). Moreover, threads' start functions can be called directly, too, and they can be compiled separately from whatever function uses them with pthread_create(), so Pthreads cannot rely on there to be anything special about how they are compiled.
At the C language level, then, there's nothing special about thread start functions (unless you count their required prototype), nor about the functions they call. All those functions have ordinary C semantics, including the ordinary semantics of the return statements they execute, as if the thread in which they run were the only thread in the program. There are indeed special semantics invoked when the initial call to a thread's start function returns to its caller, but that's a responsibility of the Pthreads implementation, outside the scope of the C language.
I'd go so far as to opine that it is indeed a bit off to say even that a return from the initial call to a thread's start function implicitly calls pthread_exit(). Certainly the result observable by other threads is as if pthread_exit(NULL) were called, but it would be better to associate that with the Pthreads implementation itself than to imply that execution of the return statement has special, context-sensitive semantics.
I was reading some basic articles about memory manipulation by the processor, and I was confused as to how the processor handles what comes next.
The concept of the call stack is clear, but I was wondering if the expression stack/register stack (used to make the calculations) is the same stack, or even if the stack for the local variables of a subroutine (a function) in a program is the same call stack.
If anyone could explain to me how the processor operates regarding its stack(s), that'd help me a lot.
All the processors I've worked on have just used a single stack for these.
If you think about what the processor is doing, you only need a single stack. During calculations you can use the same stack as the calling stack, as when the calculation is complete the stack will be 'clean' again. Same for local variables, just before you go out of the scope of the local variables your stack will be clean allowing the call to return correctly.
You can change the stack just set the SS:SP segment and pointer registers (just save the current values)
The procedure call parameters and local variables takes place in the stack. And the dynamically created objects take place in the heap (DS:DI). The SS:SP register pair shifted by the right amount of bytes to reserve the needed memory on the procedure call. And on the return the SS:SP sets back to the pre call state.
I am planning to write an electric circuit simulator in the Racket language.
To do this I would have to save the initial electrical state of the whole circuit in any form (a list in case of Racket), and repeatedly pass this value in a function until the circuit state gets into a given time.
But wouldn't this, passing a data into a function repeatedly, grow up in the stack and eventually have an impact on the program's performance?
I've heard that in case of recursive functions, at compile time the code first expands to the final stage where recursion finishes and then gets evaluated one at a time, from the most deeply nested one.
If the same applies on this situation (not only mine but in any program that includes a state machine) should I rely on mutable data structures the language rather reluctantly offers?
After reading bunch of articles praising FP I am trying to make a switch as well. Looking back to days when I went through these cases drinking mutable-state kool-aids like crazy now I feel like a criminal.
If the question is another duplicate or a similar one please give me a link, I would take it happily and shut this one down (or can I?).
If the "main loop" of your simulator permits tail call elimination, the "stack" won't grow endlessly. Essentially there is no need to "push" the arguments on the "stack" for a function call; instead you jump to the start of the function, reusing the arguments.
Try searching here for [racket] tail recursion or [racket] tail call to find examples of tail position (or not).
is there a way to implement multitasking using setjmp and longjmp functions
You can indeed. There are a couple of ways to accomplish it. The difficult part is initially getting the jmpbufs which point to other stacks. Longjmp is only defined for jmpbuf arguments which were created by setjmp, so there's no way to do this without either using assembly or exploiting undefined behavior. User level threads are inherently not portable, so portability isn't a strong argument for not doing it really.
step 1
You need a place to store the contexts of different threads, so make a queue of jmpbuf stuctures for however many threads you want.
Step 2
You need to malloc a stack for each of these threads.
Step 3
You need to get some jmpbuf contexts which have stack pointers in the memory locations you just allocated. You could inspect the jmpbuf structure on your machine, find out where it stores the stack pointer. Call setjmp and then modify its contents so that the stack pointer is in one of your allocated stacks. Stacks usually grow down, so you probably want your stack pointer somewhere near the highest memory location. If you write a basic C program and use a debugger to disassemble it, and then find instructions it executes when you return from a function, you can find out what the offset ought to be. For example, with system V calling conventions on x86, you'll see that it pops %ebp (the frame pointer) and then calls ret which pops the return address off the stack. So on entry into a function, it pushes the return address and frame pointer. Each push moves the stack pointer down by 4 bytes, so you want the stack pointer to start at the high address of the allocated region, -8 bytes (as if you just called a function to get there). We will fill the 8 bytes next.
The other thing you can do is write some very small (one line) inline assembly to manipulate the stack pointer, and then call setjmp. This is actually more portable, because in many systems the pointers in a jmpbuf are mangled for security, so you can't easily modify them.
I haven't tried it, but you might be able to avoid the asm by just deliberately overflowing the stack by declaring a very large array and thus moving the stack pointer.
Step 4
You need exiting threads to return the system to some safe state. If you don't do this, and one of the threads returns, it will take the address right above your allocated stack as a return address and jump to some garbage location and likely segfault. So first you need a safe place to return to. Get this by calling setjmp in the main thread and storing the jmpbuf in a globally accessible location. Define a function which takes no arguments and just calls longjmp with the saved global jmpbuf. Get the address of that function and copy it to your allocated stacks where you left room for the return address. You can leave the frame pointer empty. Now, when a thread returns, it will go to that function which calls longjmp, and jump right back into the main thread where you called setjmp, every time.
Step 5
Right after the main thread's setjmp, you want to have some code that determines which thread to jump to next, pulling the appropriate jmpbuf off the queue and calling longjmp to go there. When there are no threads left in that queue, the program is done.
Step 6
Write a context switch function which calls setjmp and stores the current state back on the queue, and then longjmp on another jmpbuf from the queue.
Conclusion
That's the basics. As long as threads keep calling context switch, the queue keeps getting repopulated, and different threads run. When a thread returns, if there are any left to run, one is chosen by the main thread, and if none are left, the process terminates. With relatively little code you can have a pretty basic cooperative multitasking setup. There are more things you probably want to do, like implement a cleanup function to free the stack of a dead thread, etc. You can also implement preemption using signals, but that is much more difficult because setjmp doesn't save the floating point register state or the flags registers, which are necessary when the program is interrupted asynchronously.
It may be bending the rules a little, but GNU pth does this. It's possible, but you probably shouldn't try it yourself except as an academic proof-of-concept exercise, use the pth implementation if you want to do it seriously and in a remotely portable fashion -- you'll understand why when you read the pth thread creation code.
(Essentially it uses a signal handler to trick the OS into creating a fresh stack, then longjmp's out of there and keeps the stack around. It works, evidently, but it's sketchy as hell.)
In production code, if your OS supports makecontext/swapcontext, use those instead. If it supports CreateFiber/SwitchToFiber, use those instead. And be aware of the disappointing truth that one of the most compelling use of coroutines -- that is, inverting control by yielding out of event handlers called by foreign code -- is unsafe because the calling module has to be reentrant, and you generally can't prove that. This is why fibers still aren't supported in .NET...
This is a form of what is known as userspace context switching.
It's possible but error-prone, especially if you use the default implementation of setjmp and longjmp. One problem with these functions is that in many operating systems they'll only save a subset of 64-bit registers, rather than the entire context. This is often not enough, e.g. when dealing with system libraries (my experience here is with a custom implementation for amd64/windows, which worked pretty stable all things considered).
That said, if you're not trying to work with complex external codebases or event handlers, and you know what you're doing, and (especially) if you write your own version in assembler that saves more of the current context (if you're using 32-bit windows or linux this might not be necessary, if you use some versions of BSD I imagine it almost definitely is), and you debug it paying careful attention to the disassembly output, then you may be able to achieve what you want.
I did something like this for studies.
https://github.com/Kraego/STM32L476_MiniOS/blob/main/Usercode/Concurrency/scheduler.c
The context/thread switching is done by setjmp/longjmp. The difficult part was to get the allocated stack correct (see allocateStack()) this depends on your platform.
This is just a demonstration how this could work, I would never use this in production.
As was already mentioned by Sean Ogden,
longjmp() is not good for multitasking, as
it can only move the stack upward and can't
jump between different stacks. No go with that.
As mentioned by user414736, you can use getcontext/makecontext/swapcontext
functions, but the problem with those is that
they are not fully in user-space. They actually
call the sigprocmask() syscall because they switch
the signal mask as part of the context switching.
This makes swapcontext() much slower than longjmp(),
and you likely don't want the slow co-routines.
To my knowledge there is no POSIX-standard solution to
this problem, so I compiled my own from different
available sources. You can find the context-manipulating
functions extracted from libtask here:
https://github.com/dosemu2/dosemu2/tree/devel/src/base/lib/mcontext
The functions are:
getmcontext(), setmcontext(), makemcontext() and swapmcontext().
They have the similar semantic to the standard functions with similar names,
but they also mimic the setjmp() semantic in that getmcontext()
returns 1 (instead of 0) when jumped to by setmcontext().
On top of that you can use a port of libpcl, the coroutine library:
https://github.com/dosemu2/dosemu2/tree/devel/src/base/lib/libpcl
With this, it is possible to implement the fast cooperative user-space
threading. It works on linux, on i386 and x86_64 arches.