Why do the sizes of the Oracle Database block and the Operating System Block differ? I have searched the Oracle website but haven't found a satisfactory answer.
The database block is a logical unit of storage, and the operating system block is a physical unit of storage. They don't have to be different sizes, but they can be, as long as the logical block size is equal to or larger than, and a multiple of, the physical block size. This allows Oracle to retrieve an optimal amount of data regardless of the underlying hardware, so it can be more efficient and has less overhead.
From the database concepts guide:
Data Blocks and Operating System Blocks
At the physical level, database data is stored in disk files made up
of operating system blocks. An operating system block is the minimum
unit of data that the operating system can read or write. In contrast,
an Oracle block is a logical storage structure whose size and
structure are not known to the operating system.
...
The database requests data in multiples of data blocks, not operating
system blocks.
When the database requests a data block, the operating system
translates this operation into a requests for data in permanent
storage. The logical separation of data blocks from operating system
blocks has the following implications:
Applications do not need to determine the physical addresses of data on disk.
Database data can be striped or mirrored on multiple physical disks.
The administration guide also says this:
If the database block size is different from the operating system
block size, then ensure that the database block size is a multiple of
the operating system block size.
...
A larger data block size provides greater efficiency in disk and
memory I/O (access and storage of data). Therefore, consider
specifying a block size larger than your operating system block size
if the following conditions exist:
Oracle Database is on a large computer system with a large amount of memory and fast disk drives. For example, databases controlled by
mainframe computers with vast hardware resources typically use a data
block size of 4K or greater.
The operating system that runs Oracle Database uses a small operating system block size. For example, if the operating system
block size is 1K and the default data block size matches this, the
database may be performing an excessive amount of disk I/O during
normal operation. For best performance in this case, a database block
should consist of multiple operating system blocks.
Related
The InnoDB uses buffer bool of configurable size to store last recently used pages (b+tree blocks).
Why not mmap the entire file instead? Yes, this does not work for changed pages, because you want to store them in double write buffer before writing back to destination place. But mmap lets kernel manage the LRU for pages and avoids userspace copying. Also inkernel-copy code does not use vector instructions (to avoid storing their registers in the process context).
But when page is not changed, why not use mmap to read pages and let kernel manage caching them in filesystem ram cache? So you need "custom" userspace cache for changed pages only.
LMDB author mentioned that he chosen the mmap approach to avoid data copying from filysystem cache to userspace and to avoid LRU reinvention.
What critical disadvantages of mmap i missing that lead to buffer pool approach?
Disadvantages of MMAP:
Not all operating systems support it (ahem Windows)
Coarse locking. It's difficult to allow many clients to make concurrent access to the file.
Relying on the OS to buffer I/O writes leads to increased risk of data loss if the RDBMS engine crashes. Need to use a journaling filesystem, which may not be supported on all operating systems.
Can only map a file size up to the size of the virtual memory address space, so on 32-bit OS, the database files are limited to 4GB (per comment from Roger Lipscombe above).
Early versions of MongoDB tried to use MMAP in the primary storage engine (the only storage engine in the earliest MongoDB). Since then, they have introduced other storage engines, notably WiredTiger. This has greater support for tuning, better performance on multicore systems, support for encryption and compression, multi-document transactions, and so on.
Page 526 of the textbook Operating Systems – Internals and Design Principles, eighth edition, by William Stallings, says the following:
At the lowest level, device drivers communicate directly with peripheral devices or their controllers or channels. A device driver is responsible for starting I/O operations on a device and processing the completion of an I/O request. For file operations, the typical devices controlled are disk and tape drives. Device drivers are usually considered to be part of the operating system.
Page 527 continues by saying the following:
The next level is referred to as the basic file system, or the physical I/O level. This is the primary interface with the environment outside of the computer system. It deals with blocks of data that are exchanged with disk or tape systems.
The functions of device drivers and basic file systems seem identical to me. As such, I'm not exactly sure how Stallings is differentiating them. What are the differences between these two?
EDIT
From page 555 of the ninth edition of the same textbook:
The next level is referred to as the basic file system, or the physical I/O level. This is the primary interface with the environment outside of the computer system. It deals with blocks of data that are exchanged with disk or tape systems. Thus, it is concerned with the placement of those blocks on the secondary storage device and on the buffering of those blocks in main memory. It does not understand the content of the data or the structure of the files involved. The basic file system is often considered part of the operating system.
Break this down into layer:
Layer 1) Physical I/O to a disk requires specifying the platter, sector and track to read or write to a block.
Layer 2) Logical I/O to a disk arranges the blocks in a numeric sequence and one reads or writes to a specific logical block number that gets translated into into the track/platter/sector.
Operating systems generally have support for a Logical I/O and physical I/O to the disk. That said, most disks these days do the logical to physical translation. O/S support for that is only needed for older disks.
If the device supports logical I/O the device driver performs the I/O. If the device only supports physical I/O the device driver usually handles both the Logical and Physical layers. Thus, the physical I/O layer only exists in drivers for disks that do not do logical I/O in hardware. If the disk supports logical I/O, there is no layer 1 in the driver.
All of the above is what is appears the your first quote is addressing.
Layer 3) Virtual I/O writes to a specific bytes or blocks (depending upon the O/S) to a file. This layer is usually handled outside the device driver. At this layer there are separate modules for each supported file system. Virtual I/O requests to all disks using the same file system go through the same module.
Handling Virtual I/O requires much more complexity than simply reading an writing disk blocks. The virtual I/O layer requires working with the underlying disk file system structure to allocate the blocks to a specific file.
This appears to be what is referred to in the second quote. What is confusing to me is why it is calling this the "physical I/O" layer instead of the "virtual I/O" layer.
Everywhere I have been Physical I/O and Logical I/O are the writing of raw blocks to a disk without regard to the file system on the disk.
Before starting application, I just would like to know the feasibility here.
I have data around 15GB (text and some Images) stored in SQLite database of my SD Card, I need to access it from my application. Data will get increased on daily basis and may reach till 64 GB.
Can any one tell me limitations in accessing such huge database stored in SD card from the application?
SQLite itself supports databases in that range like 16-32GB (it may start working slower, but it should still work).
However, you are likely to hit a limit of FAT32 maximum file size, which is just 4GB - and this will be tough to overcome. SQLite allows to use attached databases which allow you to split it into smaller chunks, but this is really cumbersome.
If you can format your SD card as ext4, or use internal storage as ext4, then you should not really have big problems.
I'm using sqlite as temporary storage to calculate statistic about moderately large data set.
I'm wondering what will happen if my database exceed 2GB on 32 bit system. (I can't currently change the system to 64 bit)
Does it use memory mapped files and break if size of file exceed addressable memory? (like mongodb)
According sqlite documentation, maximum size of database file is ~140 terabytes and is practically limited by os/file system.
You can read more here (note the Pages section): http://www.sqlite.org/fileformat2.html
Though this is an old question, but let me share my findings for people who reach this question.
Although Sqlite documentation states that maximum size of database file is ~140 terabytes but your OS imposes it's own restrictions on maximum file size for any type of file.
For e.g. if you are using FAT32 disk on Windows, maximum file size that I could achieve for sqLite was 2GB.
(According to Microsoft site, limit on FAT 32 system is 4GB but still my sqlite db size was restricted to 2GB).
While on Linux , I was able to reach 3 GB (where I stopped. it could have reached more size)
Find out your file system type of the partition. Remember that the file size limit its not dependent with the OS 32-bit or 64-bit, but with partition type of your hard disk.
See Wikipedia
There's a following statement in SQLite FAQ:
A transaction normally requires two complete rotations of the disk platter, which on a 7200RPM disk drive limits you to about 60 transactions per second.
As I know there's a cache on the harddisk and there might be also an extra cache in the disk driver that abstract the operation that is perceived by the software from the actual operation against the disk platter.
Then why and how exactly are transactions so strictly bound to disk platter rotation?
From Atomic Commit In SQLite
2.0 Hardware Assumptions
SQLite assumes that the operating
system will buffer writes and that a
write request will return before data
has actually been stored in the mass
storage device. SQLite further assumes
that write operations will be
reordered by the operating system. For
this reason, SQLite does a "flush" or
"fsync" operation at key points.
SQLite assumes that the flush or fsync
will not return until all pending
write operations for the file that is
being flushed have completed. We are
told that the flush and fsync
primitives are broken on some versions
of Windows and Linux. This is
unfortunate. It opens SQLite up to the
possibility of database corruption
following a power loss in the middle
of a commit. However, there is nothing
that SQLite can do to test for or
remedy the situation. SQLite assumes
that the operating system that it is
running on works as advertised. If
that is not quite the case, well then
hopefully you will not lose power too
often.
Because it ensures data integrity by making sure the data is actually written on to the disk rather than held in memory. Thus if the power goes off or something, the database is not corrupted.
This video http://www.youtube.com/watch?v=f428dSRkTs4 talks about reasons why (e.g. because SQLite is actually used in a lot of embedded devices where the power might well suddenly go off.)