I need to translate my recursive function into a stack-form, since my program ends up with a stack overflow. I am using Unity3D C#, and somewhere I saw that it's something like 5MB size for the program.
My question is, will the stack of references to items be held on the heap or still in the stack? I guess it's pretty obvious that it will go on the heap, but it's not clear to me why.
Thank you!
Whenever you create a new object (reference type) it goes to the heap (dynamic allocation), so when you'll use a new instance of Stack (data structure, not to be confused with memory stack) it'll be allocated on the heap. Stack is a reference type, not a value type.
Related
I have been working on implementing a priority queue of strings in c++ using a binary tree.
As I think the simplicity of recursion is great. I am not going to post code as I have already spent a long time today with the debugger and I am not asking for someone to debug for me, but basically after implementing recursive methods to dequeue and insert elements and testing correct behavior with up to 1000 random strings I have used a test hub that tries to enqueue 10000 random strings and I have a stack overflow error. After this, I have changed my recursive methods for others that use a pointer cursor to scan my tree to insert and dequeue using the same logic and it has not crashed as I expected (i have coded it almost as a linked list).
The question is then, Can I cause stack overflow through recursion even if I use to pass by reference?
These recursive methods are part of a class and defined as private.
I hope the question is not vague but I am still not experienced enough in c++.
Thanks a lot for your help!
In recursion you're calling your function again and again. On every call you use the stack memory for parameters, stack variables and more. So basicly the answer is definitely yes, a deep recursion can cause a stack overflow.
I was reading some basic articles about memory manipulation by the processor, and I was confused as to how the processor handles what comes next.
The concept of the call stack is clear, but I was wondering if the expression stack/register stack (used to make the calculations) is the same stack, or even if the stack for the local variables of a subroutine (a function) in a program is the same call stack.
If anyone could explain to me how the processor operates regarding its stack(s), that'd help me a lot.
All the processors I've worked on have just used a single stack for these.
If you think about what the processor is doing, you only need a single stack. During calculations you can use the same stack as the calling stack, as when the calculation is complete the stack will be 'clean' again. Same for local variables, just before you go out of the scope of the local variables your stack will be clean allowing the call to return correctly.
You can change the stack just set the SS:SP segment and pointer registers (just save the current values)
The procedure call parameters and local variables takes place in the stack. And the dynamically created objects take place in the heap (DS:DI). The SS:SP register pair shifted by the right amount of bytes to reserve the needed memory on the procedure call. And on the return the SS:SP sets back to the pre call state.
Exactly what parts of a recursive method call contributes to the stack--say, the returned object, arguments, local variables, etc.?
I'm trying to optimize the levels of recursion that an Android application can do on limited memory before running into a StackOverflowException.
Thanks in advance.
If you run out of stack space, don't optimize your stack usage. Doing that just means the same problem will come back later, with a slightly larger input set or called from somewhere else. And at some point you have reached the theoretical or practical minimum of space you can consume for the problem you're solving. Instead, convert the offending code to use a collection other than the machine stack (e.g. a heap-allocated stack or queue). Doing so sometimes results in very ugly code, but at least it won't crash.
But to answer the question: Generally all the things you name can take stack space, and temporary values take space too (so nesting expressions like crazy just to save local variables won't help). Some of these will be stored in registers, depending on the calling convention, but may have to be spilled(*) anyway. But regardless of the calling convention, this only saves you a few bytes, and everything will have to be spilled for calls as the callee is given usually free reign over registers during the call. So at the time your stack overflows, the stack is indeed crowded with parameters, local variables, and temporaries of earlier calls. Some may be optimized away altogether or share a stack slot if they aren't needed at the same time. Ultimately this is up to the JIT compiler.
(*) Spilling: Moving a value from a register to memory (i.e., the stack) because the register is needed for something else.
Each method has two stack frame sizes associated with it, the stack required for arguments and local variables, and the stack required for expression evaluation. The return value only counts as part of the stack required for expression evaluation. The JVM is able to verify that the method does not exceed these sizes as it executes.
Exactly how much stack is required for variables and expression evaluation is down to the bytecode compiler. For instance it is often able to share local variable slots among variables with non-overlapping lifetimes.
Coming from the Symbian world, I'm used to using the heap as much as possible to avoid running out of stack space, especially when handling descriptors. CBase derived classes were always dynamically allocated on the heap, since if they were not, their member variables would stay uninitialized. Does the same convention apply to QObject-derived classes?
In Qt it seems to be common to put, for example QString, on the stack. Are the string contents put on the heap while QString acts as a container on the stack, or is everything put on the stack?
As sje397 said: It's idiomatic to put QString and containers on the stack, as they are implicitly shared. Their internals (pimpl idiom "d" pointer) are created on the heap. There is no point in creating the object itself on the heap, too. Just causes memory-management hassles and you lose the intended copy-on-write properties when passing pointers to strings/containers around.
QObjects on the other hand you want to create on the heap in almost all cases, as otherwise they would be destructed again right away. They can't be copied or assigned (well, one can enforce it for own subclasses, but the QObject semantics are broken then), and usually they are supposed to survive the method body they are created in.
Exception is QDialog, which is often created on the stack, followed by QDialog::exec, which blocks until the dialog is closed. But even that is strictly spoken unsafe, as external events (RPC calls, background operations) could cause the dialog to be deleted by its parent (if the parent itself is deleted) before exec returns.
Then having the dialog created on the stack will cause double deletion when unwinding the stack -> crash.
QString, and many other Qt classes, use implicit data sharing. That implies that memory is generally allocated on the heap.
is there a way to implement multitasking using setjmp and longjmp functions
You can indeed. There are a couple of ways to accomplish it. The difficult part is initially getting the jmpbufs which point to other stacks. Longjmp is only defined for jmpbuf arguments which were created by setjmp, so there's no way to do this without either using assembly or exploiting undefined behavior. User level threads are inherently not portable, so portability isn't a strong argument for not doing it really.
step 1
You need a place to store the contexts of different threads, so make a queue of jmpbuf stuctures for however many threads you want.
Step 2
You need to malloc a stack for each of these threads.
Step 3
You need to get some jmpbuf contexts which have stack pointers in the memory locations you just allocated. You could inspect the jmpbuf structure on your machine, find out where it stores the stack pointer. Call setjmp and then modify its contents so that the stack pointer is in one of your allocated stacks. Stacks usually grow down, so you probably want your stack pointer somewhere near the highest memory location. If you write a basic C program and use a debugger to disassemble it, and then find instructions it executes when you return from a function, you can find out what the offset ought to be. For example, with system V calling conventions on x86, you'll see that it pops %ebp (the frame pointer) and then calls ret which pops the return address off the stack. So on entry into a function, it pushes the return address and frame pointer. Each push moves the stack pointer down by 4 bytes, so you want the stack pointer to start at the high address of the allocated region, -8 bytes (as if you just called a function to get there). We will fill the 8 bytes next.
The other thing you can do is write some very small (one line) inline assembly to manipulate the stack pointer, and then call setjmp. This is actually more portable, because in many systems the pointers in a jmpbuf are mangled for security, so you can't easily modify them.
I haven't tried it, but you might be able to avoid the asm by just deliberately overflowing the stack by declaring a very large array and thus moving the stack pointer.
Step 4
You need exiting threads to return the system to some safe state. If you don't do this, and one of the threads returns, it will take the address right above your allocated stack as a return address and jump to some garbage location and likely segfault. So first you need a safe place to return to. Get this by calling setjmp in the main thread and storing the jmpbuf in a globally accessible location. Define a function which takes no arguments and just calls longjmp with the saved global jmpbuf. Get the address of that function and copy it to your allocated stacks where you left room for the return address. You can leave the frame pointer empty. Now, when a thread returns, it will go to that function which calls longjmp, and jump right back into the main thread where you called setjmp, every time.
Step 5
Right after the main thread's setjmp, you want to have some code that determines which thread to jump to next, pulling the appropriate jmpbuf off the queue and calling longjmp to go there. When there are no threads left in that queue, the program is done.
Step 6
Write a context switch function which calls setjmp and stores the current state back on the queue, and then longjmp on another jmpbuf from the queue.
Conclusion
That's the basics. As long as threads keep calling context switch, the queue keeps getting repopulated, and different threads run. When a thread returns, if there are any left to run, one is chosen by the main thread, and if none are left, the process terminates. With relatively little code you can have a pretty basic cooperative multitasking setup. There are more things you probably want to do, like implement a cleanup function to free the stack of a dead thread, etc. You can also implement preemption using signals, but that is much more difficult because setjmp doesn't save the floating point register state or the flags registers, which are necessary when the program is interrupted asynchronously.
It may be bending the rules a little, but GNU pth does this. It's possible, but you probably shouldn't try it yourself except as an academic proof-of-concept exercise, use the pth implementation if you want to do it seriously and in a remotely portable fashion -- you'll understand why when you read the pth thread creation code.
(Essentially it uses a signal handler to trick the OS into creating a fresh stack, then longjmp's out of there and keeps the stack around. It works, evidently, but it's sketchy as hell.)
In production code, if your OS supports makecontext/swapcontext, use those instead. If it supports CreateFiber/SwitchToFiber, use those instead. And be aware of the disappointing truth that one of the most compelling use of coroutines -- that is, inverting control by yielding out of event handlers called by foreign code -- is unsafe because the calling module has to be reentrant, and you generally can't prove that. This is why fibers still aren't supported in .NET...
This is a form of what is known as userspace context switching.
It's possible but error-prone, especially if you use the default implementation of setjmp and longjmp. One problem with these functions is that in many operating systems they'll only save a subset of 64-bit registers, rather than the entire context. This is often not enough, e.g. when dealing with system libraries (my experience here is with a custom implementation for amd64/windows, which worked pretty stable all things considered).
That said, if you're not trying to work with complex external codebases or event handlers, and you know what you're doing, and (especially) if you write your own version in assembler that saves more of the current context (if you're using 32-bit windows or linux this might not be necessary, if you use some versions of BSD I imagine it almost definitely is), and you debug it paying careful attention to the disassembly output, then you may be able to achieve what you want.
I did something like this for studies.
https://github.com/Kraego/STM32L476_MiniOS/blob/main/Usercode/Concurrency/scheduler.c
The context/thread switching is done by setjmp/longjmp. The difficult part was to get the allocated stack correct (see allocateStack()) this depends on your platform.
This is just a demonstration how this could work, I would never use this in production.
As was already mentioned by Sean Ogden,
longjmp() is not good for multitasking, as
it can only move the stack upward and can't
jump between different stacks. No go with that.
As mentioned by user414736, you can use getcontext/makecontext/swapcontext
functions, but the problem with those is that
they are not fully in user-space. They actually
call the sigprocmask() syscall because they switch
the signal mask as part of the context switching.
This makes swapcontext() much slower than longjmp(),
and you likely don't want the slow co-routines.
To my knowledge there is no POSIX-standard solution to
this problem, so I compiled my own from different
available sources. You can find the context-manipulating
functions extracted from libtask here:
https://github.com/dosemu2/dosemu2/tree/devel/src/base/lib/mcontext
The functions are:
getmcontext(), setmcontext(), makemcontext() and swapmcontext().
They have the similar semantic to the standard functions with similar names,
but they also mimic the setjmp() semantic in that getmcontext()
returns 1 (instead of 0) when jumped to by setmcontext().
On top of that you can use a port of libpcl, the coroutine library:
https://github.com/dosemu2/dosemu2/tree/devel/src/base/lib/libpcl
With this, it is possible to implement the fast cooperative user-space
threading. It works on linux, on i386 and x86_64 arches.