OS dev: triple fault when trying to enable paging - intel

I am building a simple OS for learning purposes and I am (currently; I followed different tutorials earlier and customized something by myself) following this tutorial for enabling paging. I'm using QEMU instead of Bochs as my emulator.
If I keep paging disabled everything works fine (even the very basic kmalloc() I implemented), but as soon as I set the PG bit in the cr0 register (i.e. enable paging), everything crashes and QEMU reboots: I suspect that some of the structures (i.e. page directory, page tables, etc.) I have are not created or loaded properly, but I have no way of checking.
I've been trying to solve this problem since a while now, but haven't found a solution. Can anyone see where my mistake is?
Here you can find my complete code: https://github.com/davidedellagiustina/ScratchOS (commit 83b5c8c). Paging code is located in src/cpu/paging.*.
Edit: Setting up a super-basic page directory following exactly this tutorial results in working code. Basing on this simple example, I'm trying to build up the more complex structures (i.e. page_t, page_table_t, page_directory_t) in order to understand the mistake.

In general:
pointers should be for virtual addresses only (and should never be used for physical addresses)
physical addresses should probably be using a typedef (e.g. like typedef uint32_t phys_address_t) so that later (when you want to support PAE/Physical Address Extensions) you can change the type (e.g. use typedef uint64_t phys_address_t instead) without breaking everything. This also means you get compile-time warnings/errors when you make silly mistakes (e.g. using a virtual address/pointer where you need a physical address/unsigned integer).
almost all of the kernel should be using pointers/virtual addresses for everything. Physical addresses are only used by some device drivers (for bus mastering/DMA) and for the physical memory management itself (to allocate physical pages for page tables, etc; before mapping them into a virtual address space). This includes high level memory management ("kmalloc()" should return a void * pointer and not a physical address).
during boot, there's a small period of time when none of the kernel's normal code can work because it uses virtual addresses and paging hasn't been initialized yet. To minimize the size of this period of time (and code duplication caused by having 2 versions of functions - one for "before paging initialized" and another for "after paging initialized") you want to initialize paging as soon as possible; either with a dedicated piece of assembly language startup code that's executed before "main()" (possibly using "statically allocated at compile time" memory in the kernel's ".bss" section for the page directory and page tables), or in the boot loader itself (which is cleaner and more powerful/flexible). Things like setting up a valid kernel stack, and initializing (physical, virtual then heap) memory management, can/should wait until after paging has been initialized.
for identity mapping; you'd only need 2 loops (one to create page directory entries and another to create all page table entries), where both loops can be like this (just with different initial values in eax, ecx and edi):
.nextEntry:
stosd
add eax,0x00001000
loop .nextEntry
identity mapping isn't great. Normally you want the kernel at a high virtual address (e.g. 0xC0000000) with an area of "deliberately not used to catch NULL pointers" at 0x0000000, and user-space (processes, etc) using normal virtual addresses between them (e.g. maybe starting at virtual address 0x00400000). This makes it annoying for the code that initializes paging and the kernel's linker script (which is why it's cleaner to initialize paging in the boot loader and avoid the mess in the kernel). For this case; you will need to temporarily identity map one page (the page containing the final "mov cr0" that enables paging and the jmp kernel_entry that transfers control to code/the kernel at a higher address), and will want to delete that temporarily identity mapped page after kernel's main is started.
you will need to become "very familiar" with the debugging capabilities of your emulator. Qemu has a log that can provide very useful clues, and includes a built in monitor that offers a variety of commands (see https://en.wikibooks.org/wiki/QEMU/Monitor ). You should be able to replace the "mov cr0" (that enables paging) with an endless loop (.die: jmp die), then use the monitor to stop the emulator after it reaches the endless loop and inspect everything (contents of cr3, contents of physical memory) and find out what is wrong with the page directory or page table entries (and do similar immediately after paging is enabled to inspect the virtual address space before your code does anything with it). Qemu also allows you to attach a remote debugger (GDB).

I found out I was missing all the flags in the page directory entries (and especially the read/write and the kernel mode ones), as I was putting there just the page table address. I will keep my repository public and I will continue the development from now on, in case anyone needs it in the future.
Edit: Also, I forgot to initialize all the pages (with address and presence bit) when I created a new page table.

Related

Known reasons why sqlite3_open_v2 can take over 60s on windows?

I will start with TL;DR version as this may be enough for some of you:
We are trying to investigate an issue that we see in diagnostic data of our C++ product.
The issue was pinpointed to be caused by timeout on sqlite3_open_v2 which supposedly takes over 60s to complete (we only give it 60s).
We tried multiple different configurations, but never were able to reproduce even 5s delay on this call.
So the question is if maybe there are some known scenarios in which sqlite3_open_v2 can take that long (on windows)?
Now to the details:
We are using version 3.10.2 of SQLite. We went through changelogs from this version till now and nothing we've found in the bugfixes section seems to suggest that there was some issue that was addressed in consecutive SQLite releases and may have caused our problem.
The issue we see affects around 0.1% unique user across all supported versions of windows (Win 7, Win 8, Win 10). There are no manual user complaints/reports about that - this can suggest that a problem happens in the context where something serious enough is happening with the user machine/system that he doesn't expect anything to work. So something that indicates system-wide failure is a valid possibility as long as it can possibly happen for 0.1% of random windows users.
There are no data indicating that the same issue ever occurred on Mac which is also supported platform with large enough sample of diagnostic data.
We are using Poco (https://github.com/pocoproject/poco, version: 1.7.2) as a tool for accessing our SQLite database, but we've analyzed the Poco code and it seems that failure on this code level can only (possibly) explain ~1% of all collected samples. This is how we've determined that the problem lies in sqlite3_open_v2 taking a long time.
This happens on both DELETE journal mode as well as on WAL.
It seems like after this problem happens the first time for a particular user each consecutive call to sqlite3_open_v2 takes that long until the user restarts whole application (possibly machine, no way to tell from our data).
We are using following flags setup for sqlite3_open_v2 (as in Poco):
sqlite3_open_v2(..., ..., SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_URI, NULL);
This usually doesn't happen on startup of the application so it's not likely to be caused by something happening while our application is not running. This includes power cuts offs causing data destruction (which tends to return SQLITE_CORRUPT anyway, as mentioned in https://www.sqlite.org/howtocorrupt.html).
We were never able to reproduce this issue locally even though we tried different things:
Multiple threads writing and reading from DB with synchronization required by a particular journaling system.
Keeping SQLite connection open for a long time and working on DB normally in a meanwhile.
Trying to hit HDD hard with other data (dumping /dev/rand (WSL) to multiple files from different processes while accessing DB normally).
Trying to force antivirus software to scan DB on every file access (tested with Avast with basically everything enabled including "scan on open" and "scan on write").
Breaking our internal synchronization required by particular journaling systems.
Calling WinAPI CreateFile with all possible combinations of file sharing options on DB file - this caused issues but sqlite3_open_v2 always returned fast - just with an error.
Calling WinAPI LockFile on random parts of DB file which is btw. nice way of reproducing SQLITE_IOERR, but no luck with reproducing the discussed issue.
Some additional attempts to actually stretch the Poco layer and double-check if our static analysis of codes is right.
We've tried to look for similar issues online but anything somewhat relevant we've found was here sqlite3-open-v2-performance-degrades-as-number-of-opens-increase. This doesn't seem to explain our case though, as the numbers of parallel connections are way beyond what we have as well as what would typical windows users have (unless there is some somewhat popular app exploiting SQLite which we don't know about).
It's very unlikely that this issue is caused by db being accessed through network share as we are putting the DB file inside %appdata% unless there is some pretty standard windows configuration which sets %appdata% to be a remote share.
Do you have any ideas what can cause that issue?
Maybe some hints on what else should we check or what additional diagnostic data that we can collect from users would be useful to pinpoint the real reason why that happens?
Thanks in advance

Hardware and Sotfware saves during Context Switch in xv6

I'm studying the xv6 context switch on Operating Systems: Three Easy Pieces book. I can not fully understand the Saving and Restoring Context section of Chapter 6 (page 8).
Why there are two types of register saves/restore that happen during the Context Switch protocol?
What is the difference between the mentioned user registers and kernel registers?
What is the meaning of:
By switching stacks, the kernel enters the call to the switch code in the context of one process (the one that was interrupted) and returns in the context of another (the soon-to-be-executing one).
Why there are two types of register saves/restore that happen during the Context Switch protocol?
Assuming you are talking about p. 10. The text is a bit misleading (but not as nearly bad as I have seen in some books). They are comparing register save in interrupts those to context switches. It's really not a good comparison.
Register saves in interrupt handling is done the same way as you do it in a function call (and not like it is done in a context switch). You have to preserve any register values you are going to muck with at the start interrupt handling then restore them before the interrupt handler return. You are only dealing with general purpose registers as well (ie not process control registers).
Register save in context switches are done en-masse. All the process's registers get saved at once. An interrupt service routine might save 4 registers while a context switch might save more than 30.
What is the difference between the mentioned user registers and kernel registers?
Some registers are accessible and modifiable in user mode. The general purpose registers would certainly be user registers. The processor status is a mixed bag because it can be read in user mode, it can be modified in some ways in user mode by executing instructions but it is generally read only in user mode. You might call that a user register or might not.
There are other registers that are only accessible in kernel mode. For example, there will be registers that define the process's page table. Other registers will define the system dispatch table.
Note here the only some of the kernel mode registers are process registers (e.g. those setting up page tables) and need to be saved and restored with the process. Other kernel registers are system wide (e.g. those for timers and the system dispatch table). Those do not change with the process.
By switching stacks, the kernel enters the call to the switch code in the context of one process (the one that was interrupted) and returns in the context of another (the soon-to-be-executing one).
This is a little bit misleading in the excerpt but might make more sense if I read the book carefully.
A process context switch requires changing all the per-process registers to a block whose structure is defined by the CPU. What I find misleading in your excerpt is that the context switch involves more than just switching stacks.
Typically a context change looks something like:
SAVE_PROCESS_CONTEXT_INSTRUCTION address_of_the_current_process_context_block
LOAD_PROCESS_CONTEXT_INSTRUCTION address_of_the_next_process_context_block
As soon as you load a process context you are in the new process. That switch includes changing the kernel mode stack.
Some operating systems use terminology in their documentation that implies interrupts (especially) and (sometimes) exceptions being handlers are not done in the context of a process. In fact, the CPU ALWAYS executes in the context of a process.
As soon as you execute the context switch instruction you are in the new process BUT in an exception or interrupt handler in kernel mode. The change in the kernel stack causes the return from the exception or interrupt to resume the new process's user mode code.
So you are already in the context of the process with the PCB switch.The resulting change in the kernel mode stack pointer (ie establishing a new kernel mode stack) causes return from the exception or interrupt to pick up where the new process was before it entered kernel mode (via exception or interrupt)

App using QTreeView and QStandardItemModel does not catch up

I'm working on a programm (notifyfs) which takes care for caching of directory entries and watching the underlying filesystem for changes. The cache is stored in shared memory, (gui) clients can make use of the cache very easily.
Communication between server (notifyfs) and clients can go using a socket or via the shared memory self, by sharing a mutex and condition variable.
When a client wants to load a directory it does the following:
a. select a "view", which is a data struct in shared memory, which consists of a shared mutex, conditionvariable and a small queue (array), to communicate add/remove/change events with the client.
b. the client populates his/her model with what it already finds in the shared memory
c. send a message to the server with a reference to the view, and an indication to the path it wants to load its contents. This maybe a path, but if possible the parent entry.
d. server receives the message (does some checks), sets a watch on the directory, and synces the directory. When the directory has not yet been in the cache this means that every entry it detects is stored in the cache. While doing so it signals the view (the data in shared memory) an entry is added, and it stores this event in the array/queue.
e. the gui client has a special thread watching this view in shared memory constantly for changes using the pthread_cond_wait call. This thread is a special io thread, which can send three signals: entry added, entry removed and entry changed. The right parameters it reads from the array queue: a reference to the entry, and what the action is. These three signals are connected to three slots in my model, which is based upon a QStandardItemModel.
This works perfectly. It's very fast. When testing it I had a lot of debugging output. After removing these to test it without this extra slow io, it looks like the QTreeView can't catch up the changes. When loading a directory it loads two third of it, and when going to load another directory, this gets less and less.
I've connected the different signals from the special thread to the model using Qt::QueuedConnection.
Adding a row at a certain row is done using the insertRow(row, list) call, where row is of course the row, and list is a QList of items.
I've been looking to this issue for some time now, and saw that all the changes are detected by the special io thread, and that the signals are received by the model. Only the signal to the QTreeView is not received somehow. I've been thinking, do I have to set the communication between the models signal and the receiving slot of the treeview also to "Qt::QueuedConnection"? Maybe something else?
suggested in the reactions was to put the model in a special thread. This was tempting, but is not the right way to solve this. The model and the view should be in the same thread.
I solved this issue by doing as much as possible when it comes to providing the model with data by the special io thread. I moved some functions which populated the model to this special io, and used the standard calls to insert or remove a row. That worked.
Thanks to everyone giving suggestions,
Stef Bon

How to capture biometric information on a webpage by using Java

what's the proper way to capture biometric information (pressure, speed...) by signing with a stylus on a canvas developed in a JSP web Page
Alright, since no one else has attempted to answer this question, I shall elaborate on my comment and opefully it will serve as an answer to others as well.
First, Java Server Pages (JSP) is a server-side language. It is meant to run on the web-server and not on the user's browser. The same goes for other server-side languages like PHP and ASP.
So a server-side language is not able to directly interact with devices (keyboard, scanners, cameras, etc). Only when the data is submitted by the browser or client program, the server receives it for processing.
For a device to receive input, there are two key pieces of software needed.
The device driver: which must be installed on the user's machine
The application program to capture inputs and do any processing.
If either one is missing, the device cannot function. And then there's another issues. Depending on the device, there's various feedback from the driver/API that should go back to the application that reads it. For example, if a fingerprint scan was not very successful for some reason, the scanner should tell this to the user. So again, there's the need for interactivity between the device and the user's application.
Thus, using any server-side language is out of the question for such applicatoins.
Now, in order to make this possible, you may use a client-side program. Here are some options.
A native application in VB, C/C++, Pascal or other language. If this is an option, the user must install this application on their computer.
A browser-based program. This can be a program created using JAVA (not Javascript or JSP), or ActiveX component. ActiveX is largely OS/browser dependent. And the TRUTH is that even Java is not truly platform independent when it comes to different operating systems. There are some technical differences that you'll need to look into. But for the most part of interactivity and high-level operations, yes, Java is more platform-independent than the others. But on a personal note, Java is my worst language. I try not to use it anywhere anymore. That's a different story.
In both options above, every client machine must have their own proprietory drivers and often some sort of API for browser integration.
A year or so ago, I had to program a Bio-Mini fingerprint scanner using VB. It was all sweet in the beginning. Then due to the restrictions of networkability and concurrent usage, the drivers/SDK could not take the load and things were going wrong. By the way, the drivers/SDK were meant for MS-Access. Knowing that the DB was the problem, I started to port this to MySQL. And it was a severe climb from there. I had to do a near-rewrite of the SDK for capturing and comparing data using arrays in VB. And to make things worst, the device was changed and things went wrong again. But do note that the new device was from the same manufacturer.
So keep in mind that even a simple change like that can cause a problem.

how same key is used across processes to communicate each other using shared memory

I learned that it is necessary to use same key in both two processes to communicate using shared memory. In sample code I've seen , the key is hard coded in both programs(sender,receiver). My doubt is in real time how two unexpected processes use the same key.
I've read about ftok() function, but it asks for file path as argument. But how it is possible in real time as below scenario
suppose when user give print to file command from firefox some other program like ghostscript is going to make a ps/pdf file(assuming it uses shared memory). Here how firefox and ghostscript will use shared memory
Two processes unknown to each other would need to use a defined (and shared) protocol in order to use shared memory together. And that protocol would need to include the information about how to get to the shared memory (e.g., an integer value for a shmget call). Basically, it would need to define a "hard coded" identifier or some method for discovering it.
Without some kind of protocol defining this information (including what is in the memory), it would not be possible for one process to even deduce what was in a memory location that was set up by another process.

Resources