Real-time control of Windows Console game - console

another quick question, I want to make simple console based game, nothing too fancy, just to have some weekend project to get more familiar with C. Basically I want to make tetris, but I end up with one problem:
How to let the game engine go, and in the same time wait for input? Obviously cin or scanf is useless for me.

You're looking for a library such as ncurses.
Many Rogue-like games are written using ncurses or similar.

There's two ways to do it:
The first is to run two threads; one waits for input and updates state accordingly while the other runs the game.
The other (more common in game development) way is to write the game as one big loop that executes many times a second, updating game state, redrawing the screen, and checking for input.
But instead of blocking when you get key input, you check for the presence of pending keypresses, and if nothing has happened, you just continue through your loop. If you have multiple input sources (keyboard, network, etc.) they all get put there in the loop, checking one after another.
Yes, it's called polling. No, it's not efficient. But high-end games are usually all about pulling the maximum performance and framerates out of the computer, not running cool.
For added efficiency, you can optionally block with a timeout -- saying "wait for a keypress, but no longer than 300 milliseconds" so you can continue on with your loop.
select() comes to mind, but there are other ways of waiting or checking for input as well.

You could work out how to change stdin to non-blocking, which would enable you to write something like tetris, but the game might be more directly expressed in an event-driven paradigm. Maybe it's a good excuse to learn windows programming.
Anyway, if you want to go the console route, if you are using the microsoft compiler, then you should have kbhit() available (via conio.h) which can tell you whether a call to fgetc on stdin would block.
Actually should mention that the MinGW gcc compiler 3.4.5 also supports kbhit().

Related

Why should nesting of QEventLoops be avoided?

In his Qt event loop, networking and I/O API talk, Thiago Macieira mentions that nesting of QEventLoop's should be avoided:
QEventLoop is for nesting event Loops... Avoid it if you can because it creates a number of problems: things might reenter, new activations of sockets or timers that you were not expecting.
Can anybody expand on what he is referring to? I maintain a lot of code that uses modal dialogs which internally nest a new event loop when exec() is called so I'm very interested in knowing what kind of problems this may lead to.
A nested event loop costs you 1-2kb of stack. It takes up 5% of the L1 data cache on typical 32kb L1 cache CPUs, give-or-take.
It has the capacity to reenter any code already on the call stack. There are no guarantees that any of that code was designed to be reentrant. I'm talking about your code, not Qt's code. It can reenter code that has started this event loop, and unless you explicitly control this recursion, there are no guarantees that you won't eventually run out of stack space.
In current Qt, there are two places where, due to a long standing API bugs or platform inadequacies, you have to use nested exec: QDrag and platform file dialogs (on some platforms). You simply don't need to use it anywhere else. You do not need a nested event loop for non-platform modal dialogs.
Reentering the event loop is usually caused by writing pseudo-synchronous code where one laments the supposed lack of yield() (co_yield and co_await has landed in C++ now!), hides one's head in the sand and uses exec() instead. Such code typically ends up being barely palatable spaghetti and is unnecessary.
For modern C++, using the C++20 coroutines is worthwhile; there are some Qt-based experiments around, easy to build on.
There are Qt-native implementations of stackful coroutines: Skycoder42/QtCoroutings - a recent project, and the older ckamm/qt-coroutine. I'm not sure how fresh the latter code is. It looks that it all worked at some point.
Writing asynchronous code cleanly without coroutines is usually accomplished through state machines, see this answer for an example, and QP framework for an implementation different from QStateMachine.
Personal anecdote: I couldn't wait for C++ coroutines to become production-ready, and I now write asynchronous communication code in golang, and statically link that into a Qt application. Works great, the garbage collector is unnoticeable, and the code is way easier to read and write than C++ with coroutines. I had a lot of code written using C++ coroutines TS, but moved it all to golang and I don't regret it.
A nested event loop will lead to ordering inversion. (at least on qt4)
Lets say you have the following sequence of things happening
enqueued in outer loop: 1,2,3
processing 1 => spawn inner loop
enqueue 4 in inner loop
processing 4
exit inner loop
processing 2
So you see the processing order was: 1,4,2,3.
I speak from experience and this usually resulted in a crash in my code.

Why does zumero_sync need to be called multiple times?

According to the documentation for zumero_sync:
If a large amount of information needs to be pulled from the server,
this function may need to be called more than once.
In my Android app that uses Zumero that's no problem; I just keep calling zumero_sync until the return value doesn't start with "0;".
However, now I'm trying to write an admin script that also syncs with my server dbfiles. I'd like to use the sqlite3 shell, and have the script pass the SQL to execute via command line arguments. I need to call zumero_sync in a loop (which SQLite doesn't support) to make sure the db is fully synced. If I had to, I could invoke sqlite3 in a loop (reading its output, looking for "0;"), or even write a C++ app to call the SQLite/Zumero functions natively. But it certainly would be easier if a single zumero_sync was enough.
I guess my real question is: could zumero_sync be changed so it completes the sync before returning? If there are cases where the existing behavior is more useful, maybe there could be a parameter for specifying which mode to use?
I see two basic questions here:
(1) Why does zumero_sync() work the way it does?
(2) Can it work differently?
I'll answer (2) first, since it's easier: Yes, it could work differently. Rather, we could (and probably will, soon, you brought this up) implement an additional function, named something like zumero_sync_complete(), which performs [the guts of] zumero_sync() in a loop and returns after the sync is complete.
We didn't implement zumero_sync_complete() because it doesn't add much value. It's a simple loop, so you can darn well write it yourself. :-)
Er, except in scripting environments which don't support loops. Like the sqlite3 shell.
Answer to (1):
The Zumero sync protocol is designed to give the server the flexibility to return partial results if it wants to do so. And for the sake of reducing load on the server (and increasing its scalability) it often does want to do exactly that.
Given that, one reason to expose this to the client is to increase the client's flexibility as well. As long we're making multiple roundtrips, we might as well give the client an opportunity to do something (like, maybe, update a progress bar) in between them.
Another thing a client might want to do in between loop iterations is handle an error.
Or, in the case of a multithreaded client, it might want to deal with changes that happened on the client while the sync is going on.
Which raises the question of how locking should be managed? Do we hold the sqlite write lock during the entire loop? Or only when absolutely necessary?
Bottom line: A robust app would probably want to implement the loop itself so that it can make its own decisions and retain full control over things.
But, as you observe, the sqlite3 shell doesn't have loops. And it's not an app. And it doesn't have threads. Or progress bars. So it's a use case where a simpler-and-less-powerful form of zumero_sync() would make sense.

Is a preemptive multitasking OS possible on the interruptless DCPU-16?

I am looking into various OS designs in the hopes of writing a simple multitasking OS for the DCPU-16. However, everything I read about implementation of preemptive multitasking is centered around interrupts. It sounds like in the era of 16-bit hardware and software, cooperative multitasking was more common, but that requires every program to be written with multitasking in mind.
Is there any way to implement preemptive multitasking on an interruptless architecture? All I can think of is an interpreter which would dynamically switch tasks, but that would have a huge performance hit (possibly on the order of 10-20x+ if it had to parse every operation and didn't let anything run natively, I'm imagining).
Preemptive multitasking is normally implemented by having interrupt routines post status changes/interesting events to a scheduler, which decides which tasks to suspend, and which new tasks to start/continue based on priority. However, other interesting events can occur when a running task makes a call to an OS routine, which may have the same effect.
But all that matters is that some event is noted somewhere, and the scheduler decides who to run. So you can make all such event signalling/scheduling occur only only on OS calls.
You can add egregious calls to the scheduler at "convenient" points in various task application code to make your system switch more often. Whether it just switches, or uses some background information such as elapsed time since the last call is a scheduler detail.
Your system won't be as responsive as one driven by interrupts, but you've already given that up by choosing the CPU you did.
Actually, yes. The most effective method is to simply patch run-times in the loader. Kernel/daemon stuff can have custom patches for better responsiveness. Even better, if you have access to all the source, you can patch in the compiler.
The patch can consist of a distributed scheduler of sorts. Each program can be patched to have a very low-latency timer; on load, it will set the timer, and on each return from the scheduler, it will reset it. A simplistic method would allow code to simply do an
if (timer - start_timer) yield to scheduler;
which doesn't yield too big a performance hit. The main trouble is finding good points to pop them in. In between every function call is a start, and detecting loops and inserting them is primitive but effective if you really need to preempt responsively.
It's not perfect, but it'll work.
The main issue is making sure that the timer return is low latency; that way it is just a comparison and branch. Also, handling exceptions - errors in the code that cause, say, infinite loops - in some way. You can technically use a fairly simple hardware watchdog timer and assert a reset on the CPU without clearing any of the RAM; an in-RAM routine would be where RESET vector points, which would inspect and unwind the stack back to the program call (thus crashing the program but preserving everything else). It's sort of like a brute-force if-all-else-fails crash-the-program. Or you could POTENTIALLY change it to multi-task this way, RESET as an interrupt, but that is much more difficult.
So...yes. It's possible but complicated; using techniques from JIT compilers and dynamic translators (emulators use them).
This is a bit of a muddled explanation, I know, but I am very tired. If it's not clear enough I can come back and clear it up tomorrow.
By the way, asserting reset on a CPU mid-program sounds crazy, but it is a time-honored and proven technique. Early versions of Windows even did it to run compatibility mode on, I think 386's, properly, because there was no way to switch back to 32-bit from 16-bit mode. Other processors and OSes have done it too.
EDIT: So I did some research on what the DCPU is, haha. It's not a real CPU. I have no idea if you can assert reset in Notch's emulator, I would ask him. Handy technique, that is.
I think your assessment is correct. Preemptive multitasking occurs if the scheduler can interrupt (in the non-inflected, dictionary sense) a running task and switch to another autonomously. So there has to be some sort of actor that prompts the scheduler to action. If there are no interrupting devices (in the inflected, technical sense) then there's little you can do in general.
However, rather than switching to a full interpreter, one idea that occurs is just dynamically reprogramming supplied program code. So before entry into a process, the scheduler knows full process state, including what program counter value it's going to enter at. It can then scan forward from there, substituting, say, either the twentieth instruction code or the next jump instruction code that isn't immediately at the program counter with a jump back into the scheduler. When the process returns, the scheduler puts the original instruction back in. If it's a jump (conditional or otherwise) then it also effects the jump appropriately.
Of course, this scheme works only if the program code doesn't dynamically modify itself. And in that case you can preprocess it so that you know in advance where jumps are without a linear search. You could technically allow well-written self-modifying code if it were willing to nominate all addresses that may be modified, allowing you definitely to avoid those in your scheduler's dynamic modifications.
You'd end up sort of running an interpreter, but only for jumps.
another way is to keep to small tasks based on an event queue (like current GUI apps)
this is also cooperative but has the effect of not needing OS calls you just return from the task and then it will go on to the next task
if you then need to continue a task you need to pass the next "function" and a pointer to the data you need to the task queue

Possible to fork a process outside from it?

Well, it is obvious, let's say we have two processes A and F. F wants to fork A when it has the CPU control (and A is suspended since CPU is on F).
I have Googled however nothing related showed up. Is such a thing possible in Unix environments?
There is definitely no standard and/or portable way to clone a process from the outside but depending on the OS, there are certainly possible ways to divert a process from its task and force it to clone itself or do whatever you want.
I don't think it's a good idea in any way, but it may be possible for process F to attach to A using a debugger interface such as ptrace. Doing something like suspending the target process, saving its state, diverting the process to run fork, then restoring its original state.
It should be noted that your cloning process will probably need to handle some odd cases around threads and the like.
No it's not possible.
The fork() system call makes a copy of the parent, so if you call fork() in the F process, the child will be a copy of F, there's nothing you can do to change this behavior.
The reason this is not possible is because, normally with fork(), there is exactly one difference to start out with between the two processes: the return value of the fork() call itself. Without such a call inside the code of A, there is no way for the processes to have any difference between them, so they would both be doing exactly the same thing, when normally with you want one of the processes to start doing something different.
How exactly do you think what you want to do should work?
No, this would be a huge security hole that would result in the leakage of sensitive information if it were possible.
At best, you could setup a signal handler in the parent process that would fork(2) off a child (maybe exec(2) a pre-configured child process?).
I think you would be better served by looking in to message passing between two processes that have CPU affinity setup, but even then, I think the gains would be nominal (over-optimizing a problem?).
http://www.freebsd.org/cgi/man.cgi?query=cpuset&apropos=0&sektion=0

Multitasking using setjmp, longjmp

is there a way to implement multitasking using setjmp and longjmp functions
You can indeed. There are a couple of ways to accomplish it. The difficult part is initially getting the jmpbufs which point to other stacks. Longjmp is only defined for jmpbuf arguments which were created by setjmp, so there's no way to do this without either using assembly or exploiting undefined behavior. User level threads are inherently not portable, so portability isn't a strong argument for not doing it really.
step 1
You need a place to store the contexts of different threads, so make a queue of jmpbuf stuctures for however many threads you want.
Step 2
You need to malloc a stack for each of these threads.
Step 3
You need to get some jmpbuf contexts which have stack pointers in the memory locations you just allocated. You could inspect the jmpbuf structure on your machine, find out where it stores the stack pointer. Call setjmp and then modify its contents so that the stack pointer is in one of your allocated stacks. Stacks usually grow down, so you probably want your stack pointer somewhere near the highest memory location. If you write a basic C program and use a debugger to disassemble it, and then find instructions it executes when you return from a function, you can find out what the offset ought to be. For example, with system V calling conventions on x86, you'll see that it pops %ebp (the frame pointer) and then calls ret which pops the return address off the stack. So on entry into a function, it pushes the return address and frame pointer. Each push moves the stack pointer down by 4 bytes, so you want the stack pointer to start at the high address of the allocated region, -8 bytes (as if you just called a function to get there). We will fill the 8 bytes next.
The other thing you can do is write some very small (one line) inline assembly to manipulate the stack pointer, and then call setjmp. This is actually more portable, because in many systems the pointers in a jmpbuf are mangled for security, so you can't easily modify them.
I haven't tried it, but you might be able to avoid the asm by just deliberately overflowing the stack by declaring a very large array and thus moving the stack pointer.
Step 4
You need exiting threads to return the system to some safe state. If you don't do this, and one of the threads returns, it will take the address right above your allocated stack as a return address and jump to some garbage location and likely segfault. So first you need a safe place to return to. Get this by calling setjmp in the main thread and storing the jmpbuf in a globally accessible location. Define a function which takes no arguments and just calls longjmp with the saved global jmpbuf. Get the address of that function and copy it to your allocated stacks where you left room for the return address. You can leave the frame pointer empty. Now, when a thread returns, it will go to that function which calls longjmp, and jump right back into the main thread where you called setjmp, every time.
Step 5
Right after the main thread's setjmp, you want to have some code that determines which thread to jump to next, pulling the appropriate jmpbuf off the queue and calling longjmp to go there. When there are no threads left in that queue, the program is done.
Step 6
Write a context switch function which calls setjmp and stores the current state back on the queue, and then longjmp on another jmpbuf from the queue.
Conclusion
That's the basics. As long as threads keep calling context switch, the queue keeps getting repopulated, and different threads run. When a thread returns, if there are any left to run, one is chosen by the main thread, and if none are left, the process terminates. With relatively little code you can have a pretty basic cooperative multitasking setup. There are more things you probably want to do, like implement a cleanup function to free the stack of a dead thread, etc. You can also implement preemption using signals, but that is much more difficult because setjmp doesn't save the floating point register state or the flags registers, which are necessary when the program is interrupted asynchronously.
It may be bending the rules a little, but GNU pth does this. It's possible, but you probably shouldn't try it yourself except as an academic proof-of-concept exercise, use the pth implementation if you want to do it seriously and in a remotely portable fashion -- you'll understand why when you read the pth thread creation code.
(Essentially it uses a signal handler to trick the OS into creating a fresh stack, then longjmp's out of there and keeps the stack around. It works, evidently, but it's sketchy as hell.)
In production code, if your OS supports makecontext/swapcontext, use those instead. If it supports CreateFiber/SwitchToFiber, use those instead. And be aware of the disappointing truth that one of the most compelling use of coroutines -- that is, inverting control by yielding out of event handlers called by foreign code -- is unsafe because the calling module has to be reentrant, and you generally can't prove that. This is why fibers still aren't supported in .NET...
This is a form of what is known as userspace context switching.
It's possible but error-prone, especially if you use the default implementation of setjmp and longjmp. One problem with these functions is that in many operating systems they'll only save a subset of 64-bit registers, rather than the entire context. This is often not enough, e.g. when dealing with system libraries (my experience here is with a custom implementation for amd64/windows, which worked pretty stable all things considered).
That said, if you're not trying to work with complex external codebases or event handlers, and you know what you're doing, and (especially) if you write your own version in assembler that saves more of the current context (if you're using 32-bit windows or linux this might not be necessary, if you use some versions of BSD I imagine it almost definitely is), and you debug it paying careful attention to the disassembly output, then you may be able to achieve what you want.
I did something like this for studies.
https://github.com/Kraego/STM32L476_MiniOS/blob/main/Usercode/Concurrency/scheduler.c
The context/thread switching is done by setjmp/longjmp. The difficult part was to get the allocated stack correct (see allocateStack()) this depends on your platform.
This is just a demonstration how this could work, I would never use this in production.
As was already mentioned by Sean Ogden,
longjmp() is not good for multitasking, as
it can only move the stack upward and can't
jump between different stacks. No go with that.
As mentioned by user414736, you can use getcontext/makecontext/swapcontext
functions, but the problem with those is that
they are not fully in user-space. They actually
call the sigprocmask() syscall because they switch
the signal mask as part of the context switching.
This makes swapcontext() much slower than longjmp(),
and you likely don't want the slow co-routines.
To my knowledge there is no POSIX-standard solution to
this problem, so I compiled my own from different
available sources. You can find the context-manipulating
functions extracted from libtask here:
https://github.com/dosemu2/dosemu2/tree/devel/src/base/lib/mcontext
The functions are:
getmcontext(), setmcontext(), makemcontext() and swapmcontext().
They have the similar semantic to the standard functions with similar names,
but they also mimic the setjmp() semantic in that getmcontext()
returns 1 (instead of 0) when jumped to by setmcontext().
On top of that you can use a port of libpcl, the coroutine library:
https://github.com/dosemu2/dosemu2/tree/devel/src/base/lib/libpcl
With this, it is possible to implement the fast cooperative user-space
threading. It works on linux, on i386 and x86_64 arches.

Resources