How does sqlite cache_spill pragma exactly work? - sqlite

I would like to know how the cache_spill = false pragma exactly works. I understand that once the cache is full it should be written to disk even before a commit actually happens. I understand this could be problematic because it requires keeping an exclusive lock since that moment until the actual commit takes place. I understand one could increment the cache size to ameliorate this potential problem. And I understand that one would like to magically avoid any spill under such circumstances although I don't believe the cache_spill pragma works in magical ways. So:
Does it make further API calls that require cache growing to fail, so signaling the user a commit is in order?
Does it stop writing to the memory cache and use the disk instead, losing performance but avoiding the spill?

The cache spilling affected by this pragma happens only when the database runs into a soft memory limit.
If you inhibit these spills, the changed data is just kept in memory.
This might result in an out-of-memory error if you need some memory for more changed data (or for anything else).
In practice, most operating systems will just swap out some data to disk (which is more inefficient because the data must be read back from swap before it is actually committed).

Related

What could cause a sqlite application to slow down over time with high load?

I'll definitely need to update this based on feedback so I apologize in advance.
The problem I'm trying to solve is roughly this.
The graph shows Disk utilization in the Windows task manager. My sqlite application is a webserver that takes in json requests with timestamps, looks up the existing entry in a 2 column key/value table, merges the request into the existing item (they don't grow over time), and then writes it back to the database.
The db is created as follows. I've experimented with and without WAL without difference.
createStatement().use { it.executeUpdate("CREATE TABLE IF NOT EXISTS items ( key TEXT NOT NULL PRIMARY KEY, value BLOB );") }
The write/set is done as follows
try {
val insertStatement = "INSERT OR REPLACE INTO items (key, value) VALUES (?, ?)"
prepareStatement(insertStatement).use {
it.setBytes(1, keySerializer.serialize(key))
it.setBytes(2, valueSerializer.serialize(value))
it.executeUpdate()
}
commit()
} catch (t: Throwable) {
rollback()
throw t
}
I use a single database connection the entire time which seems to be ok for my use case and greatly improves performance relative to getting a new one for each operation.
val databaseUrl = "jdbc:sqlite:${System.getProperty("java.io.tmpdir")}/$name-map-v2.sqlite"
if (connection?.isClosed == true || connection == null) {
connection = DriverManager.getConnection(databaseUrl)
}
I'm effectively serializing access to the db. I'm pretty sure the default threading mode for the sqlite driver is to serialize and I'm also doing some serializing in kotlin coroutines (via actors).
I'm load testing the application locally and I notice that disk utilization spikes around the one minute mark but I can't determine why. I know that throughput plummets when that happens though. I expect the server to chug along at a more or less constant rate. The db in these tests is pretty small too, hardly reaches 1mb.
Hoping people can recommend some next steps or set me straight as far as performance expectations. I'm assuming there is some sqlite specific thing that happens when throughput is very high for too long, but I would have thought it would be related to WAL or something (which I'm not using).
I have a theory but it's a bit farfetched.
The fact that you hit a performance wall after some time makes me think that either a buffer somewhere is filling up, or some other kind of data accumulation threshold is being reached.
Where exactly the culprit is, I'm not sure.
So, I'd run the following tests.
// At the beginning
connection.setAutoCommit(true);
If the problem is in the driver side of the rollback transaction buffer, then this will slightly (hopefully) slow down operations, "spreading" the impact away from the one-minute mark. Instead of getting fast operations for 59 seconds and then some seconds of full stop, you get not so fast operations the whole time.
In case the problem is further down the line, try
PRAGMA JOURNAL_MODE=MEMORY
PRAGMA SYNCHRONOUS=OFF disables the rollback journal synchronization
(The data will be more at risk in case of a catastrophic powerdown).
Finally, another possibility is that the page translation buffer gets filled after a sufficient number of different keys has been entered. You can test this directly by doing these two tests:
1) pre-fill the database with all the keys in ascending order and a large request, then start updating the same many keys.
2) run the test with only very few keys.
If the slowdown does not occur in the above cases, then it's either TLB buffer management that's not up to the challenge, or database fragmentation is a problem.
It might be the case that issuing
PRAGMA PAGE_SIZE=32768
upon database creation might solve or mitigate the problem. Conversely, PRAGMA PAGE_SIZE=1024 could "spread" the problem avoiding performance bottlenecks.
Another thing to try is closing the database connection and reopening it when it gets older than, say, 30 seconds. If this works, we'll still need to understand why it works (in this case I expect the JDBC driver to be at fault).
First of all, I want to say that I do not use exactly your driver for sqlite, and I use different devices in my work. (but how different are they really?)
From what I see, correct me if im wrong, you use one transaction, for one insert statement. You get request, you use the disc, you use the memory, open, close etc... every time. This can't work fast.
The first thing I do when I have to do inserts in sqlite is to group them, and use a single transaction to do it. That way, you are using your resources in batches.
One transaction, many insert statements, single commit. If there is a problem with a batch, handle the valid separately, log the faulty, move the next batch of requests.

sqlite3 multiple inserts really slow

I have build a file archiver in Windows which uses sqlite3 to store files and takes advantage of multicore techniques to complete the archive faster.
I am trying a backup of 100.000 files now and insertion is slow.
When I comment the line which inserts, the app uses 100% CPU which is normal. With the insertion line on, it rarely gets above 25%.
As the archiving progresses, insertion gets more and more slow, processing a few files/second with a cpu usage of 11%. No disk usage is shown, so the bottleneck can't be the disk.
I 've:
PRAGMA temp_store = MEMORY
PRAGMA journal_mode = MEMORY
PRAGMA synchronous = OFF
and the entire insertion is within a transaction.
After further analysis it seems that SQLite's problem is to bind the blob64 (if I pass 0, it seems to be fine).
Why SQLite would have a problem inserting a raw blob of data into the archive?
Any ideas?
Thanks.
Your answer may lie here:
https://www.sqlite.org/threadsafe.html
Because it says there that:
The default mode is serialized.
which might explain your observations.
According to that document, you can either configure this at compile time (which I would most definitely not myself do) or via:
sqlite3_config (SQLITE_CONFIG_MULTITHREAD);
Just how stratospherically it then performs I wouldn't know.

OpenEdge 10.2B08 lruskips overhead

I'm writing a business case for implementing the lruskips parameter. I know the benefits (and they are likely sizeable). What I don't know is the performance overhead in terms of memory etc I should consider as the downsides. We need a balanced viewpoint after all!
OpenEdge 10.2B08
Various flavours of Windows that have not been patched with Linux.
The feature eliminates overhead by avoiding the housekeeping associated with maintaining the LRU chain. Using it does not add any memory requirements and it reduces CPU consumption.
Instead of moving a block to the head of the LRU chain every time it is referenced it only does that every X references. So rather than evicting the block that is absolutely the "least recently used" a block that is "probably not very recently used" will be evicted.
The only potential downside is that there could, theoretically, be a perverse case where setting it too high might result in some poor eviction decisions. IOW the "probably" part of things turns out to be untrue because you aren't checking often enough (you have basically converted the -B management to a FIFO queue).
For instance, setting it to 1000000 with a -B of 1000 might not be very smart. (But the bigger issue is probably-B 1000)
Suppose that you did that (set -lruskips 1000000 -B 1000) and your application has a mix of "normal" access along with a few background processes doing some sort of sequential scan to support reporting.
The reporting stuff is going to read lots of data that it only looks at once (or a very few times). This data is going to be immediately placed at the MRU end of the queue and will move towards the LRU end as new data is read.
Even though it is being frequently accessed the working set of "normal" data will be pushed to the LRU end because the access counter update is being skipped. With a smallish -B it is going to fall off the end pretty quickly and be evicted. And then the next access will cause an IO to occur and the cycle will restart.
The higher you set -lruskips the more sensitive you will become to that sort of thing. -lruskips 10 will eliminate 90% of lru housekeeping and should be a very, very low risk setting. -lruskips 100 will eliminate 99% of the housekeeping and should still be awfully low risk (unless -B is crazy small). Unless you have a very special set of circumstances setting it higher than 100 seems mostly pointless. You are well into "diminishing returns" and possibly pushing your luck vis a vis perverse outcomes.

Are global memory barriers required if only one work item reads and writes to memory

In my kernel, each work item has a reserved memory region in a buffer
that only it writes to and reads from.
Is it necessary to use memory barriers in this case?
EDIT:
I call mem_fence(CLK_GLOBAL_MEM_FENCE) before each write and before each read. Is this enough to guarantee load/store consistency?
Also, is this even necessary if only one work item is loading storing to this memory region ?
See this other stack overflow question:
In OpenCL, what does mem_fence() do, as opposed to barrier()?
The memory barriers work at a workgroup level, this is, stopping the threads belonging to the same block of threads until all of them reach the barrier. If there is not intersection between the memory spaces of different work items, there is not needed any extra synchronization point.
Also, is this even necessary if only one work item is loading storing to this memory region ?
Theoretically, mem_fence only guarantees the commit of the previous memory accesses before the later ones. In my case, I never saw differences in the results of applications using or not this mem_fence call.
Best regards

SQLITE database WAL file size keeps growing

I am writing continuously into a db file which has PRAGMA journal_mode=WAL, PRAGMA journal_size_limit=0. My C++ program has two threads, one reader(queries at 15 sec intervals) and one writer(inserts at 5 sec intervals).
Every 3 min I am pausing insertion to run a sqlite3_wal_checkpoint_v2() from the writer thread with the mode parameter as SQLITE_CHECKPOINT_RESTART. To ensure that no active read operations are going on at this point, I set a flag that checkpointing is about to take place and wait for reader to complete (the connection is still open) before running checkpoint. After checkpoint completion I again indicate to readers it is okay to resume querying.
sqlite3_wal_checkpoint_v2() returns SQLITE_OK, and pnLog and Ckpt as equal(around 4000), indicating complete wal file has been synced with main db file. So next write should start from beginning according to documentation. However, this does not seem to be happening as the subsequent writes cause the WAL file to grow indefinitely, eventually up to some GBs.
I did some searching and found that that readers can cause checkpoint failure due to open transactions. However, the only reader I'm using is ending its transaction before the checkpoint starts. What else could be preventing the WAL file from not growing?
This is far too late as an answer, but may be useful to other people.
According to the SQLite documentation, your expectations should be correct, but if you read this SO post, problems arise also in case of non-finalized statements. Therefore, if you just sqlite3_reset() your statement, there are chances anyway that the db may look busy or locked for a checkpoint. Note that this may happen also with higher levels of SQLITE_CHECKPOINT_values.
Also, the SQLITE_CHECKPOINT_TRUNCATE value, if checkout is successfully operated, will truncate the -wal file to zero length. That may help you check that all pages have been inserted in the db.
Another discussion in which -wal files grow larger and larger due to unfinalized statements is this.

Resources