minimize disk IO with sqlalchemy/sqlite in read-only - sqlite

I am modifying a project that opens a sqlite database through sqlalchemy.
I am running multiple processes of this application, which seems to cause some (locking?) issues. The processes stall waiting for IO (state D in top).
The database is only queried / read, never written to. Unfortunately, the 5GB database file is on a NFS directory, so reading is not instantaneous.
Can I set the database to read-only? Will sqlite/sqlalchemy then avoid the locking/transaction mechanism? Setting isolation_level="READ UNCOMMITTED" seems not to have been enough. It's not clear to me from the documentation how and whether I should set SERIALIZABLE.
Alternatively, I guess I could copy the db to memory first, but it is not clear how to hand the memory database over to sqlalchemy. Should I copy on connect?

I was able to solve it by copying the database into memory, by replacing
ENGINE = create_engine('sqlite:///' + DATABASE_FILE)
with:
ENGINE = create_engine('sqlite:///')
import sqlite3
filedb = sqlite3.connect('file:' + DATABASE_FILE + '?mode=ro', uri=True)
print("loading database to memory ...")
filedb.backup(ENGINE.raw_connection().connection)
print("loading database to memory ... done")
filedb.close()
followed by
BASE = declarative_base()
SESSION = sessionmaker(bind=ENGINE)

Related

sqlite .backup command fails when another process writes to the database (Error: database is locked)

The goal is to complete an online backup while other processes write to the database.
I connect to the sqlite database via the command line, and run
.backup mydatabase.db
During the backup, another process writes to the database and I immediately receive the message
Error: database is locked
and the backup disappears (reverts to a size of 0).
During the backup process there is a journal file, although it never gets very large. I checked that the journal_size_limit pragma is set to -1, which I believe means its unlimited. My understanding is that writes to the database should go to the journal during the backup process, but maybe I'm wrong. I'm new to sqlite and databases in general.
Am I going about this the wrong way?
If the sqlite3 backup writes "Error: database is locked", then you should use
sqlite3 source.db ".timeout 10000" ".backup backup.db"
See also Increase the lock timeout with sqlite, and what is the default values? about default timeouts (spoiler: it's zero) and now with backups solved you can switch SQLite to WAL mode (it supports multiple writers!).
//writing this as an answer so it would be easier to google this, thanks guys!

Open an existing SQLite database in memory with Lua

The code in my project now:
local lsqlite3 = require "lsqlite3complete"
self.db_conn = lsqlite3.open("cost.db")
function showrow(udata,cols,values,names)
assert(udata=='test_udata')
for i=1,cols do
print('',names[i],values[i])
end
return 0
end
self.db_conn:exec('select * from cost',showrow,'test_udata')
It is no problem to select the cost records from the code above, but if I change like below and try to open it in memory:
self.db_conn = lsqlite3.open_memory("cost.db")
The code has no error but there is no records or tables inside when I do the query. How can I change my code so that I can open and put my database inside memory? Since I would like to access my data quickly in memory instead of keep connecting to a database.
A memory database is one that exists only in memory. That is, it doesn't get its data from a file. Because of that, open_memory doesn't take any parameters.
If you want to use a database that lives in a file, then that means accessing that file.
You should not need to "keep connecting to a database". You connect to it once at the beginning of the application and keep it open until your application terminates.

Sqlite WALmode do not support memory mode?

I create my sqlite DB connection by using memory mode:
connection = DriverManager.getConnection("jdbc:sqlite:file:" + dbName + "?mode=memory&cache=shared")
And then use WAL-mode by excute: "pragma journal_mode=WAL"
I found that the memory size growing all the time when insert/delete operation. And it can not release even i call the wal_checkpoint(TRUNCATE).
A in-memory database only supports memory mode or no journal. Using a journal file for a database that only exists in RAM and goes away when the program that uses it exits doesn't make any sense.

SQLite Backup Strategy

I'm trying to backup my sqlite database from a cronjob that runs in 5 minute intervals. The database is "live", so there are running queries at the time I want to perform the backup.
I want to be sure, that the database is in good shape when I backup it so that I can rely on the backup.
My current strategy (in pseudocode):
function backup()
{
#try to acquire the lock for 2 seconds, then check the database integrity
sqlite3 mydb.sqlite '.timeout 2000' 'PRAGMA integrity_check;'
if (integrity is ok and database was not locked)
{
#perform the backup to backup.sqlite
sqlite3 mydb.sqlite '.timeout 2000' '.backup backup.sqlite'
if (backup could be performed)
{
#Check the consistency of the backup database
sqlite3 backup.sqlite 'PRAGMA integrity_check;'
if (ok)
{
return true;
}
}
}
return false;
}
Now, there are some problems with my strategy:
If the live database is locked, I run into problems because I cannot perform the backup then. Maybe a transaction could help there?
If something goes wrong between the PRAGMA integrity_check; and the backup, I'm f*cked.
Any ideas? And by the way, what is the difference between the sqlite3 .backup and a good old cp mydb.sqlite mybackup.sqlite ?
[edit] I'm running nodejs on an embedded system, so if someone suggests the sqlite online backup api with some ruby wrapper - no chance ;(
If you want to backup while queries are running you need to use the backup API. The documentation has a worked out example of an online backup of a running database (example 2). I don't understand the Ruby reference, you can integrate it in your program or do it as a small C program running besides the real application -- I've done both.
An explicit integrity_check on the backup is overkill. The backup API guarantees that the destination database is consistent and up-to-date. (The flip side of that coin is that if you update the DB too often while a backup is running, the backup might never finish.)
It is possible to use 'cp' to make a backup, but not of a running database. You need to have an exclusive lock for the entire duration of the backup, so it's not really 'live'. You also need to be careful to copy all of sqlite's temp files as well as the main database.
I'd expect the sqlite3 ".backup" command to use the backup API.
If you cannot use the backup API, you must use another mechanism to prevent the database file from being modified while you're copying it.
Start a transaction with BEGIN IMMEDIATE:
After a BEGIN IMMEDIATE, no other database connection will be able to write to the database or do a BEGIN IMMEDIATE or BEGIN EXCLUSIVE. Other processes can continue to read from the database, however.

Can I read and write to a SQLite database concurrently from multiple connections?

I have a SQLite database that is used by two processes. I am wondering, with the most recent version of SQLite, while one process (connection) starts a transaction to write to the database will the other process be able to read from the database simultaneously?
I collected information from various sources, mostly from sqlite.org, and put them together:
First, by default, multiple processes can have the same SQLite database open at the same time, and several read accesses can be satisfied in parallel.
In case of writing, a single write to the database locks the database for a short time, nothing, even reading, can access the database file at all.
Beginning with version 3.7.0, a new “Write Ahead Logging” (WAL) option is available, in which reading and writing can proceed concurrently.
By default, WAL is not enabled. To turn WAL on, refer to the SQLite documentation.
SQLite3 explicitly allows multiple connections:
(5) Can multiple applications or multiple instances of the same
application access a single database file at the same time?
Multiple processes can have the same database open at the same time.
Multiple processes can be doing a SELECT at the same time. But only
one process can be making changes to the database at any moment in
time, however.
For sharing connections, use SQLite3 shared cache:
Starting with version 3.3.0, SQLite includes a special "shared-cache"
mode (disabled by default)
In version 3.5.0, shared-cache mode was modified so that the same
cache can be shared across an entire process rather than just within a
single thread.
5.0 Enabling Shared-Cache Mode
Shared-cache mode is enabled on a per-process basis. Using the C
interface, the following API can be used to globally enable or disable
shared-cache mode:
int sqlite3_enable_shared_cache(int);
Each call sqlite3_enable_shared_cache() effects subsequent database
connections created using sqlite3_open(), sqlite3_open16(), or
sqlite3_open_v2(). Database connections that already exist are
unaffected. Each call to sqlite3_enable_shared_cache() overrides all
previous calls within the same process.
I had a similar code architecture as you. I used a single SQLite database which process A read from, while process B wrote to it concurrently based on events. (In python 3.10.2 using the most up to date sqlite3 version). Process B was continually updating the database, while process A was reading from it to check data. My issue was that it was working in debug mode, but not in "release" mode.
In order to solve my particular problem I used Write Ahead Logging, which is referenced in previous answers. After creating my database in Process B (write mode) I added the line:
cur.execute('PRAGMA journal_mode=wal') where cur is the cursor object created from establishing connection.
This set the journal to wal mode which allows for concurrent access for multiple reads (but only one write). In Process A, where I was reading the data, before connecting to the same database I included:
time.sleep(0.5)
Setting a sleep timer before a connection was made to the same database fixed my issue with it not working in "release" mode.
In my case: I did not have to manually set any checkpoints, locks, or transactions. Your use case might be different than mine however, so research is most likely required. Nevertheless, I hope this post helps and saves everyone some time!

Resources