In all optimization guides people talk about pragma's like JOURNAL_MODE or SYNCHRONOUS, but I never read anything about the TEMP_STORE pragma? I would expect it has a large impact, so why is it never mentioned?
It's purpose is to move all SQLite's internal temporary tables from disk (temp directory) to memory, which seems a lot faster than hitting the disk on every SELECT?
SQLite locks the entire database when doing writes, so I would imagine that it is better to get the data onto the platters before proceeding with your next task.
Putting the data in memory is most likely reserved for those occasions when you would only want a temporary data store (as the TEMP_STORE name implies); you would still need to provide a method for periodically flushing the data to disk (if you want to save it), and since the locking is not granular, you would have to flush the entire database.
In other words, TEMP_STORE is not a caching mechanism.
Related
I have an application which uses the QSQLITE driver with a QSqlDatabse on a file on the local filesystem. I want to write a backup function which will save a snapshot of the database.
Simply copying the file seems like an obvious, easy way to do it but I'm not sure when it is safe to do so.
The application modifies the database at well-defined points. Each time, a new QSqlQuery object is created, used, and immediately destroyed. Explicitly locking/flushing is an acceptable solution, but the Qt API doesn't seem to expose this.
I can't find any documentation on when Qt commits the database to disk. I imagine the QSqlDatabase destructor would do it, but even then I also don't know if (on Windows or Linux) copying the file is guaranteed to result in the most-recent changes being copied (as opposed to, say, only those changes which have been finalised in the filesystem journal). Can someone confirm or deny this? Does it make any difference if the writing filehandle is closed before the copy is executed?
Maybe the only safe way is to do an online copy but I am already using the Qt API and don't know how this would interact.
Any advice would be appreciated.
It's trivial to copy a SQLite database but it's less trivial to do this in a way that won't corrupt it. This will give you a nice clean backup that's sure to be in a proper state, since writing to the database half-way through your copying process is impossible.
QSqlQuery qry(db);
qry.prepare( "BEGIN IMMEDIATE;");
qry.exec();
QFile::copy(databaseName, destination);
qry.prepare( "ROLLBACK;");
qry.exec();
After a BEGIN IMMEDIATE, no other database connection will be able to write to the database or do a BEGIN IMMEDIATE or BEGIN EXCLUSIVE.
This has very little to do with Qt. It is database related. This procedure will work with any ACID compliant database, and SQLite is one of these.
From http://www.sqlite.org/transactional.html
SQLite is Transactional
A transactional database is one in which all changes and queries
appear to be Atomic, Consistent, Isolated, and Durable (ACID). SQLite
implements serializable transactions that are atomic, consistent,
isolated, and durable, even if the transaction is interrupted by a
program crash, an operating system crash, or a power failure to the
computer.
This does not mean you can copy the file and it will be consistent. You should probably use block level snapshots for this before you copy. If you are using Linux, read this,
http://tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html
The procedure would then be,
snapshot
copy DB from snapshot to backup device
remove snapshot volume
Snapshots are global "freeze" of file system, which is consistent because of ACID. File copy is linear operation, which cannot be guaranteed to be consistent without halting all DB operations for duration of copy. This means straight copy is not safe for online databases (in general).
I have a system where multiple processes successfully share a single SQLite disk based database. The size and nature of the database is such that faster access is always desirable and database is temporary anyway, so keeping it fully in memory sounds like a good idea. I know SQLite supports in memory databases but it appears as if there is no way to share an in-memory database with another process (or at least this is how I understand it). Considering SQLite seems to use file mappings I see no reason why a process-shared in-memory database could not exist (at least in theory).
I am keen to know if anybody knows a way to do this or has some other suggestion.
It is true, that SQlite does not support sharing a memory database with other processes. There is little reason to implement such a feature, because uses cases are mostly artificial. You cite performance as a use case, but you can just create a file based database on a tmpfs if you are on Linux. Otherwise you can still use a number of pragmas, such as PRAGMA synchronous=OFF; to speed up your database by giving up durability. Going further, you can use PRAGMA journal_mode=MEMORY; to prepare commits in memory or even use PRAGMA journal_mode=OFF; if you do not need transaction support at all.
One of the main reasons for the lack of support is the need for locking. SQlite needs some means to lock the database and currently these locking operations tied to the file operations in the SQlite VFS implementation. You might still be able to implement your own VFS module that works in memory, but you risk implementing a filesystem.
Does SQLite3 safely handle concurrent access by multiple processes
reading/writing from the same DB? Are there any platform exceptions to that?
If most of those concurrent accesses are reads (e.g. SELECT), SQLite can handle them very well. But if you start writing concurrently, lock contention could become an issue. A lot would then depend on how fast your filesystem is, since the SQLite engine itself is extremely fast and has many clever optimizations to minimize contention. Especially SQLite 3.
For most desktop/laptop/tablet/phone applications, SQLite is fast enough as there's not enough concurrency. (Firefox uses SQLite extensively for bookmarks, history, etc.)
For server applications, somebody some time ago said that anything less than 100K page views a day could be handled perfectly by a SQLite database in typical scenarios (e.g. blogs, forums), and I have yet to see any evidence to the contrary. In fact, with modern disks and processors, 95% of web sites and web services would work just fine with SQLite.
If you want really fast read/write access, use an in-memory SQLite database. RAM is several orders of magnitude faster than disk.
Yes it does.
Lets figure out why
SQLite is transactional
All changes within a single transaction in SQLite either occur
completely or not at all
Such ACID support as well as concurrent read/writes are provided in 2 ways - using the so-called journaling (lets call it “old way”) or write-ahead logging (lets call it “new way”)
Journaling (Old Way)
In this mode SQLite uses DATABASE-LEVEL locking.
This is the crucial point to understand.
That means whenever it needs to read/write something it first acquires a lock on the ENTIRE database file.
Multiple readers can co-exist and read something in parallel.
During writing it makes sure an exclusive lock is acquired and no other process is reading/writing simultaneously and hence writes are safe.
(This is known as a multiple-readers-single-writer or MSRW lock)
This is why here they’re saying SQlite implements serializable transactions
Troubles
As it needs to lock an entire database every time and everybody waits for a process handling writing concurrency suffers and such concurrent writes/reads are of fairly low performance
Rollbacks/outages
Prior to writing something to the database file SQLite would first save the chunk to be changed in a temporary file. If something crashes in the middle of writing into the database file it would pick up this temporary file and revert the changes from it
Write-Ahead Logging or WAL (New Way)
In this case all writes are appended to a temporary file (write-ahead log) and this file is periodically merged with the original database.
When SQLite is searching for something it would first check this temporary file and if nothing is found proceed with the main database file.
As a result, readers don’t compete with writers and performance is much better compared to the Old Way.
Caveats
SQlite heavily depends on the underlying filesystem locking functionality so it should be used with caution, more details here
You're also likely to run into the database is locked error, especially in the journaled mode so your app needs to be designed with this error in mind
Yes, SQLite handles concurrency well, but it isn't the best from a performance angle. From what I can tell, there are no exceptions to that. The details are on SQLite's site: https://www.sqlite.org/lockingv3.html
This statement is of interest: "The pager module makes sure changes happen all at once, that either all changes occur or none of them do, that two or more processes do not try to access the database in incompatible ways at the same time"
Nobody seems to have mentioned WAL (Write Ahead Log) mode. Make sure the transactions are properly organised and with WAL mode set on, there is no need to keep the database locked whilst people are reading things whilst an update is going on.
The only issue is that at some point the WAL needs to be re-incorporated into the main database, and it does this when the last connection to the database closes. With a very busy site you might find it take a few seconds for all connections to be close, but 100K hits per day should not be a problem.
In 2019, there are two new concurrent write options not released yet but available in separate branches.
"PRAGMA journal_mode = wal2"
The advantage of this journal mode over regular "wal" mode is that writers may continue writing to one wal file while the other is checkpointed.
BEGIN CONCURRENT - link to detailed doc
The BEGIN CONCURRENT enhancement allows multiple writers to process write transactions simultanously if the database is in "wal" or "wal2" mode, although the system still serializes COMMIT commands.
When a write-transaction is opened with "BEGIN CONCURRENT", actually locking the database is deferred until a COMMIT is executed. This means that any number of transactions started with BEGIN CONCURRENT may proceed concurrently. The system uses optimistic page-level-locking to prevent conflicting concurrent transactions from being committed.
Together they are present in begin-concurrent-wal2 or each in a separate own branch.
SQLite has a readers-writer lock on the database level. Multiple connections (possibly owned by different processes) can read data from the same database at the same time, but only one can write to the database.
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution. -- Appropriate Uses For SQLite # SQLite.org
The readers-writer lock enables independent transaction processing and it is implemented using exclusive and shared locks on the database level.
An exclusive lock must be obtained before a connection performs a write operation on a database. After the exclusive lock is obtained, both read and write operations from other connections are blocked till the lock is released again.
Implementation details for the case of concurrent writes
SQLite has a lock table that helps locking the database as late as possible during a write operation to ensure maximum concurrency.
The initial state is UNLOCKED, and in this state, the connection has not accessed the database yet. When a process is connected to a database and even a transaction has been started with BEGIN, the connection is still in the UNLOCKED state.
After the UNLOCKED state, the next state is the SHARED state. In order to be able to read (not write) data from the database, the connection must first enter the SHARED state, by getting a SHARED lock.
Multiple connections can obtain and maintain SHARED locks at the same time, so multiple connections can read data from the same database at the same time. But as long as even only one SHARED lock remains unreleased, no connection can successfully complete a write to the database.
If a connection wants to write to the database, it must first get a RESERVED lock.
Only a single RESERVED lock may be active at one time, though multiple SHARED locks can coexist with a single RESERVED lock. RESERVED differs from PENDING in that new SHARED locks can be acquired while there is a RESERVED lock. -- File Locking And Concurrency In SQLite Version 3 # SQLite.org
Once a connection obtains a RESERVED lock, it can start processing database modification operations, though these modifications can only be done in the buffer, rather than actually written to disk. The modifications made to the readout content are saved in the memory buffer.
When a connection wants to submit a modification (or transaction), it is necessary to upgrade the RESERVED lock to an EXCLUSIVE lock. In order to get the lock, you must first lift the lock to a PENDING lock.
A PENDING lock means that the process holding the lock wants to write to the database as soon as possible and is just waiting on all current SHARED locks to clear so that it can get an EXCLUSIVE lock. No new SHARED locks are permitted against the database if a PENDING lock is active, though existing SHARED locks are allowed to continue.
An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held.
-- File Locking And Concurrency In SQLite Version 3 # SQLite.org
So you might say SQLite safely handles concurrent access by multiple processes writing to the same DB simply because it doesn't support it! You will get SQLITE_BUSY or SQLITE_LOCKED for the second writer when it hits the retry limitation.
This thread is old but i think it would be good to share result of my tests done on sqlite:
i ran 2 instances of python program (different processes same program) executing statements SELECT and UPDATE sql commands within transaction with EXCLUSIVE lock and timeout set to 10 seconds to get a lock, and result were frustrating. Every instance did in 10000 step loop:
connect to db with exclusive lock
select on one row to read counter
update the row with new value equal to counter incremented by 1
close connection to db
Even if sqlite granted exclusive lock on transaction, the total number of really executed cycles were not equal to 20 000 but less (total number of iterations over single counter counted for both processes).
Python program almost did not throw any single exception (only once during select for 20 executions).
sqlite revision at moment of test was 3.6.20 and python v3.3 CentOS 6.5.
In mine opinion it is better to find more reliable product for this kind of job or restrict writes to sqlite to single unique process/thread.
It is natural when you specify the name for db or even in memory db if you have concurrent access (specially write) you will get this.
In my case, I am using Sqlite for testing and it is because there are several tests in the same solution it happens.
You can have two improvements:
Delete before creating db.Database.EnsureDeletedAsync();
Use an empty string for connection, in this case it will create a random name each call:
{
"ConnectionStrings": {
"ConnectionType": "sqlite",
"ConnectionString": ""
}
}
I just started playing with Berkeley DB a few days ago so I'm trying to see if there's something I've been missing when it comes to storing data as fast as possible.
Here's some info about the data:
- it comes in 512 byte chunks
- chunks come in order
- chunks will be deleted in FIFO order
- if i lose some data off the end because of power failure that's ok as long as the whole db isn't broken
After reading the a bunch of the documentation it seemed like a Queue db was exactly what I wanted.
However, after trying some test code my fastest results were about 1MByte per second just looping through a DB->put with DB_APPEND set. I also tried using transactions and bulk puts but both of these slowed things down considerably so I didn't pursue them for much time. I was inserting into a fresh db created on a NANDFlash chip on my Freescale i.MX35 dev board.
Since we're looking to get at least 2MBytes per second write speeds, I was wondering if there's something I missed that can improve my speeds since I know that my hardware can write faster than this.
Try putting this into your DB_CONFIG:
set_flags DB_TXN_WRITE_NOSYNC
set_flags DB_TXN_NOSYNC
From my experience, these increase write performance a lot.
DB_TXN_NOSYNC
If set, Berkeley DB will not write or synchronously flush the log on transaction commit or prepare. This means that transactions exhibit the ACI (atomicity, consistency, and isolation) properties, but not D (durability); that is, database integrity will be maintained, but if the application or system fails, it is possible some number of the most recently committed transactions may be undone during recovery. The number of transactions at risk is governed by how many log updates can fit into the log buffer, how often the operating system flushes dirty buffers to disk, and how often the log is checkpointed
Calling DB_ENV->set_flags with the DB_TXN_NOSYNC flag only affects the specified DB_ENV handle (and any other Berkeley DB handles opened within the scope of that handle). For consistent behavior across the environment, all DB_ENV handles opened in the environment must either set the DB_TXN_NOSYNC flag or the flag should be specified in the DB_CONFIG configuration file.
The DB_TXN_NOSYNC flag may be used to configure Berkeley DB at any time during the life of the application.
DB_TXN_WRITE_NOSYNC
If set, Berkeley DB will write, but will not synchronously flush, the log on transaction commit or prepare. This means that transactions exhibit the ACI (atomicity, consistency, and isolation) properties, but not D (durability); that is, database integrity will be maintained, but if the system fails, it is possible some number of the most recently committed transactions may be undone during recovery. The number of transactions at risk is governed by how often the system flushes dirty buffers to disk and how often the log is checkpointed.
Calling DB_ENV->set_flags with the DB_TXN_WRITE_NOSYNC flag only affects the specified DB_ENV handle (and any other Berkeley DB handles opened within the scope of that handle). For consistent behavior across the environment, all DB_ENV handles opened in the environment must either set the DB_TXN_WRITE_NOSYNC flag or the flag should be specified in the DB_CONFIG configuration file.
The DB_TXN_WRITE_NOSYNC flag may be used to configure Berkeley DB at any time during the life of the application.
See http://www.mathematik.uni-ulm.de/help/BerkeleyDB/api_c/env_set_flags.html for more details.
I suggest you must use transactions / TDS datastore if as you mention you cannot recreate a database (i.e. it isnt just a local cache) if it gets corrupted. If you dont care about loosing a few items in event of a crash/power outage then DB_TXN_WRITE_NOSYNC will improve TDS performance, you database will still be integral and recoverable.
If you store using BTREE and a numeric index (if you have no natural key) and watch out for endian issues so you get good key locality and high page utilization then you should be able to get way more than 2000 inserts a second, especially to SSD, especially if you Use DbMultileKeyDataBuilder to do bulk inserts.
I'm investigating SQLite as a storage engine, and am curious to know whether SQLite locks the database file on reads.
I am concerned about read performance as my planned project will have few writes, but many reads. If the database does lock, are there measures that can be taken (such as memory caching) to mitigate this?
You can avoid locks when reading, if you set database journal mode to Write-Ahead Logging (see: http://www.sqlite.org/wal.html).
From its Wikipedia page:
Several computer processes or threads may access the same database without problems. Several read accesses can be satisfied in parallel.
More precisely, from its FAQ:
Multiple processes can have the same database open at the same time. Multiple processes can be doing a SELECT at the same time. But only one process can be making changes to the database at any moment in time, however.
A single write to the database however, does lock the database for a short time so nothing can access it at all (not even reading). Details may be found in File Locking And Concurrency In SQLite Version 3. Basically reading the database is no problem unless someone wants to write to the database immediately. In that case the DB is locked exclusively for the time it takes to execute that transaction and the lock is released afterwards. However, details are scarce on what exactly does with read operations on the datapase in the time of a PENDING or EXCLUSIVE lock. My guess is that they either return SQLITE_BUSY or block until they can read. In the first case, it shouldn't be too hard to just try again, especially if you are expecting few writes.
Adding more info for this answer:
Q: Does SQLite lock the database file on reads?
A: No and Yes
Ref: https://www.sqlite.org/atomiccommit.html#_acquiring_a_read_lock
The first step toward reading from the database file is obtaining a shared lock on the database file. A "shared" lock allows two or more database connections to read from the database file at the same time. But a shared lock prevents another database connection from writing to the database file while we are reading it