MFT dead wood; and MFTs on removable media - ntfs-mft

I am new to the topic of the MFT in NTFS. I have read a number of documents about it but so far I haven't found explicit statements on some questions that immediately come to mind.
If you have a Windows desktop and read files from a removable medium which is NTFS-formatted (say, a USB stick or removable HD), then I assume that the timeline information (access dates etc) gets written to the MFT on the removable medium. So, if the USB or HD is removed, the timeline disappears. Does the host OS (the Windows desktop) retain any information separately about which files were opened and on which removable volume?
Articles on MFT say that initially, no MFT record is deleted when a file is deleted. So the MFT grows and grows. There is mention that some of the records of deleted files will eventually get overwritten as the disk fills up: but I can't find details about the algorithm used. At the same time there is the implication that there is no upper limit on MFT size, and that it can crash thinly-resourced systems. This sounds rather extreme. So, is there any way of trimming the dead wood (eg, a utility which will permanently delete the unwanted entries)? I read that the OS won't allow the MFT size to be changed on an active system, but perhaps a utility could run before the full system loads, as CHKDSK does (I think)?
Can a hard upper limit on MFT size be set (short of completely filling the HD)?

Related

Why InnoDB does use buffer pool, not mmap entire file?

The InnoDB uses buffer bool of configurable size to store last recently used pages (b+tree blocks).
Why not mmap the entire file instead? Yes, this does not work for changed pages, because you want to store them in double write buffer before writing back to destination place. But mmap lets kernel manage the LRU for pages and avoids userspace copying. Also inkernel-copy code does not use vector instructions (to avoid storing their registers in the process context).
But when page is not changed, why not use mmap to read pages and let kernel manage caching them in filesystem ram cache? So you need "custom" userspace cache for changed pages only.
LMDB author mentioned that he chosen the mmap approach to avoid data copying from filysystem cache to userspace and to avoid LRU reinvention.
What critical disadvantages of mmap i missing that lead to buffer pool approach?
Disadvantages of MMAP:
Not all operating systems support it (ahem Windows)
Coarse locking. It's difficult to allow many clients to make concurrent access to the file.
Relying on the OS to buffer I/O writes leads to increased risk of data loss if the RDBMS engine crashes. Need to use a journaling filesystem, which may not be supported on all operating systems.
Can only map a file size up to the size of the virtual memory address space, so on 32-bit OS, the database files are limited to 4GB (per comment from Roger Lipscombe above).
Early versions of MongoDB tried to use MMAP in the primary storage engine (the only storage engine in the earliest MongoDB). Since then, they have introduced other storage engines, notably WiredTiger. This has greater support for tuning, better performance on multicore systems, support for encryption and compression, multi-document transactions, and so on.

Layout of ELF binary in virtual memory

All modern *nix operating systems use virtual memory concept (with paging). And as far as i know, this concept of virtual memory is used to set a layer of abstraction between the programmer and the real physical memory: the programmer doesn't have to be limited to ram size and he can see the program as a large contiguous space of data, instructions, heap and stack (manipulate pointers according to that concept). When we compile & link a source code we get an executable file stored on HDD known as ELF, that file contains all data and instructions of the program beside some additional information like stack and heap sizes (only created at runtime).
Now my questions:
1. How does this binary file (elf) is mapped to virtual memory ?
2. Does every process has its own virtual memory (a page file !!!) ?
3. What is the program's layout after being mapped to virtual memory ?
4. What is exactly the preferred base address and how does it look in virtual memory ?
5. What is the difference between a RVA and an Offset ?
You don't have to answers all the questions or give detailed answers instead you can provide me with good full readings about the subject, thanks.
How does this binary file (elf) is mapped to virtual memory ??
The executable file contains instructions to the loader on how to lay out the address space. On some systems, parts of the executable can be mapped to memory and serve as a page file.
Does every process has its own virtual memory (a page file !!!) ?
Every process has its own logical address space. Some areas within that address space may be shared with other processes.
What is the program's layout after being mapped to virtual memory ?
The depends upon the system and what the executable told the loader to do.
What is exactly the preferred base address and how does it look in virtual memory ?
That is just the desirable start location for loading something in memory. Most compilers generate relocatable code that is not tied to any specific logical address.
What is the difference between a RVA and an Offset ?
RVA is a screwed up unixism for an offset. What is not clear, in your question is what type of offset you are talking about. There are byte offsets from pages. RVA is usually an offset from a loading location that can span pages.

On what parameters boot sequence varies?

Does every Unix flavor have same boot sequence code ? I mean there are different kernel version releases going on for different flavors, so is there possibility of different code for boot sequence once kernel is loaded? Or they keep their boot sequence (or code) common always?
Edit: I want to know into detail how boot process is done.
Where does MBR finds a GRUB? How this information is stored? Is it by default hard-coded?
Is there any block level partion architecture available for boot sequence?
How GRUB locates the kernel image? Is it common space, where kernel image is stored?
I searched a lot on web; but it shows common architecture BIOS -> MBR -> GRUB -> Kernel -> Init.
I want to know details of everything. What should I do to know this all? Is there any way I could debug boot process?
Thanks in advance!
First of all, the boot process is extremely platform and kernel dependent.
The point is normally getting the kernel image loaded somewhere in memory and run it, but details may differ:
where do I get the kernel image? (file on a partition? fixed offset on the device? should I just map a device in memory?)
what should be loaded? (only a "core" image? also a ramdisk with additional data?)
where should it be loaded? Is additional initialization (CPU/MMU status, device initialization, ...) required?
are there kernel parameters to pass? Where should they be put for the kernel to see?
where is the configuration for the bootloader itself stored (hard-coded, files on a partition, ...)? How to load the additional modules? (bootloaders like GRUB are actually small OSes by themselves)
Different bootloaders and OSes may do this stuff differently. The "UNIX-like" bit is not relevant, an OS starts being ostensibly UNIXy (POSIX syscalls, init process, POSIX userland,...) mostly after the kernel starts running.
Even on common x86 PCs the start differs deeply between "traditional BIOS" and UEFI mode (in this last case, the UEFI itself can load and start the kernel, without additional bootloaders being involved).
Coming down to the start of a modern Linux distribution on x86 in BIOS mode with GRUB2, the basic idea is to quickly get up and running a system which can deal with "normal" PC abstractions (disk partitions, files on filesystems, ...), keeping at minimum the code that has to deal with hardcoded disk offsets.
GRUB is not a monolithic program, but it's composed in stages. When booting, the BIOS loads and executes the code stored in the MBR, which is the first stage of GRUB. Since the amount of code that can be stored there is extremely limited (few hundred bytes), all this code does is to act as a trampoline for the next GRUB stage (somehow, it "boots GRUB");
the MBR code contains hard-coded the address of the first sector of the "core image"; this, in turn, contains the code to load the rest of the "core image" from disk (again, hard-coded as a list of disk sectors);
Once the core image is loaded, the ugly work is done, since the GRUB core image normally contains basic file system drivers, so it can load additional configuration and modules from regular files on the boot partition;
Now what happens depends on the configuration of the specific boot entry; for booting Linux, usually there are two files involved: the kernel image and the initrd:
initrd contains the "initial ramdrive", containing the barebones userland mounted as / in the early boot process (before the kernel has mounted the filesystems); it mostly contains device detection helpers, device drivers, filesystem drivers, ... to allow the kernel to be able to load on demand the code needed to mount the "real" root partition;
the kernel image is a (usually compressed) executable image in some format, which contains the actual kernel code; the bootloader extracts it in memory (following some rules), puts the kernel parameters and initrd memory position in some memory location and then jumps to the kernel entrypoint, whence the kernel takes over the boot process;
From there, the "real" Linux boot process starts, which normally involves loading device drivers, starting init, mounting disks and so on.
Again, this is all (x86, BIOS, Linux, GRUB2)-specific; points 1-2 are different on architectures without an MBR, and are are skipped completely if GRUB is loaded straight from UEFI; 1-3 are different/avoided if UEFI (or some other loader) is used to load directly the kernel image. The initrd thing may be not involved if the kernel image already bundles all that is needed to start (typical of embedded images); details of points 4-5 are different for different OSes (although the basic idea is usually similar). And, on embedded machines the kernel may be placed directly at a "magic" location that is automatically mapped in memory and run at start.

File size limit for SQLite on 32bit system

I'm using sqlite as temporary storage to calculate statistic about moderately large data set.
I'm wondering what will happen if my database exceed 2GB on 32 bit system. (I can't currently change the system to 64 bit)
Does it use memory mapped files and break if size of file exceed addressable memory? (like mongodb)
According sqlite documentation, maximum size of database file is ~140 terabytes and is practically limited by os/file system.
You can read more here (note the Pages section): http://www.sqlite.org/fileformat2.html
Though this is an old question, but let me share my findings for people who reach this question.
Although Sqlite documentation states that maximum size of database file is ~140 terabytes but your OS imposes it's own restrictions on maximum file size for any type of file.
For e.g. if you are using FAT32 disk on Windows, maximum file size that I could achieve for sqLite was 2GB.
(According to Microsoft site, limit on FAT 32 system is 4GB but still my sqlite db size was restricted to 2GB).
While on Linux , I was able to reach 3 GB (where I stopped. it could have reached more size)
Find out your file system type of the partition. Remember that the file size limit its not dependent with the OS 32-bit or 64-bit, but with partition type of your hard disk.
See Wikipedia

Why are SQLite transactions bound to harddisk rotation?

There's a following statement in SQLite FAQ:
A transaction normally requires two complete rotations of the disk platter, which on a 7200RPM disk drive limits you to about 60 transactions per second.
As I know there's a cache on the harddisk and there might be also an extra cache in the disk driver that abstract the operation that is perceived by the software from the actual operation against the disk platter.
Then why and how exactly are transactions so strictly bound to disk platter rotation?
From Atomic Commit In SQLite
2.0 Hardware Assumptions
SQLite assumes that the operating
system will buffer writes and that a
write request will return before data
has actually been stored in the mass
storage device. SQLite further assumes
that write operations will be
reordered by the operating system. For
this reason, SQLite does a "flush" or
"fsync" operation at key points.
SQLite assumes that the flush or fsync
will not return until all pending
write operations for the file that is
being flushed have completed. We are
told that the flush and fsync
primitives are broken on some versions
of Windows and Linux. This is
unfortunate. It opens SQLite up to the
possibility of database corruption
following a power loss in the middle
of a commit. However, there is nothing
that SQLite can do to test for or
remedy the situation. SQLite assumes
that the operating system that it is
running on works as advertised. If
that is not quite the case, well then
hopefully you will not lose power too
often.
Because it ensures data integrity by making sure the data is actually written on to the disk rather than held in memory. Thus if the power goes off or something, the database is not corrupted.
This video http://www.youtube.com/watch?v=f428dSRkTs4 talks about reasons why (e.g. because SQLite is actually used in a lot of embedded devices where the power might well suddenly go off.)

Resources