Sqlite3: Disabling primary key index while inserting? - sqlite

I have an Sqlite3 database with a table and a primary key consisting of two integers, and I'm trying to insert lots of data into it (ie. around 1GB or so)
The issue I'm having is that creating primary key also implicitly creates an index, which in my case bogs down inserts to a crawl after a few commits (and that would be because the database file is on NFS.. sigh).
So, I'd like to somehow temporary disable that index. My best plan so far involved dropping the primary key's automatic index, however it seems that SQLite doesn't like it and throws an error if I attempt to do it.
My second best plan would involve the application making transparent copies of the database on the network drive, making modifications and then merging it back. Note that as opposed to most SQlite/NFS questions, I don't need access concurrency.
What would be a correct way to do something like that?
UPDATE:
I forgot to specify the flags I'm already using:
PRAGMA synchronous = OFF
PRAGMA journal_mode = OFF
PRAGMA locking_mode = EXCLUSIVE
PRAGMA temp_store = MEMORY
UPDATE 2:
I'm in fact inserting items in batches, however every next batch is slower to commit than previous one (I'm assuming this has to do with the size of index). I tried doing batches of between 10k and 50k tuples, each one being two integers and a float.

You can't remove embedded index since it's the only address of row.
Merge your 2 integer keys in single long key = (key1<<32) + key2; and make this as a INTEGER PRIMARY KEY in youd schema (in that case you will have only 1 index)
Set page size for new DB at least 4096
Remove ANY additional index except primary
Fill in data in the SORTED order so that primary key is growing.
Reuse commands, don't create each time them from string
Set page cache size to as much memory as you have left (remember that cache size is in number of pages, but not number of bytes)
Commit every 50000 items.
If you have additional indexes - create them only AFTER ALL data is in table
If you'll be able to merge key (I think you're using 32bit, while sqlite using 64bit, so it's possible) and fill data in sorted order I bet you will fill in your first Gb with the same performance as second and both will be fast enough.

Are you doing the INSERT of each new as an individual Transaction?
If you use BEGIN TRANSACTION and INSERT rows in batches then I think the index will only get rebuilt at the end of each Transaction.

See faster-bulk-inserts-in-sqlite3.

Related

SQLite: re-arrange physical position of rows inside file

My problem is that my querys are too slow.
I have a fairly large sqlite database. The table is:
CREATE TABLE results (
timestamp TEXT,
name TEXT,
result float,
)
(I know that timestamps as TEXT is not optimal, but please ignore that for the purposes of this question. I'll have to fix that when I have the time)
"name" is a category. This calculation holds the results of a calculation that has to be done at each timestamp for all "name"s. So the inserts are done at equal-timestamps, but the querys will be done at equal-names (i.e. I want given a name, get its time series), like:
SELECT timestamp,result WHERE name='some_name';
Now, the way I'm doing things now is to have no indexes, calculate all results, then create an index on name CREATE INDEX index_name ON results (name). The reasoning is that I don't need the index when I'm inserting, but having the index will make querys on the index really fast.
But it's not. The database is fairly large. It has about half a million timestamps, and for each timestamp I have about 1000 names.
I suspect, although I'm not sure, that the reason why it's slow is that every though I've indexed the names, they're still scattered all around the physical disk. Something like:
timestamp1,name1,result
timestamp1,name2,result
timestamp1,name3,result
...
timestamp1,name999,result
timestamp1,name1000,result
timestamp2,name1,result
timestamp2,name2,result
etc...
I'm sure this is slower to query with NAME='some_name' than if the rows were physically ordered as:
timestamp1,name1,result
timestamp2,name1,result
timestamp3,name1,result
...
timestamp499997,name1000,result
timestamp499998,name1000,result
timestamp499999,name1000,result
timestamp500000,namee1000,result
etc...
So, how do I tell SQLite that the order in which I'd like the rows in disk isn't the one they were written in?
UPDATE: I'm further convinced that the slowness in doing a select with such an index comes exclusively from non-contiguous disk access. Doing SELECT * FROM results WHERE name=<something_that_doesnt_exist> immediately returns zero results. This suggests that it's not finding the names that's slow, it's actually reading them from the disk.
Normal sqlite tables have, as a primary key, a 64-bit integer (Known as rowid and a few other aliases). That determines the order that rows are stored in a B*-tree (Which puts all actual data in leaf node pages). You can change this with a WITHOUT ROWID table, but that requires an explicit primary key which is used to place rows in a B-tree. So if every row's (name, timestamp) columns make a unique value, that's a possibility that will leave all rows with the same name on a smaller set of pages instead of scattered all over.
You'd want the composite PK to be in that order if you're searching for a particular name most of the time, so something like:
CREATE TABLE results (
timestamp TEXT
, name TEXT
, result REAL
, PRIMARY KEY (name, timestamp)
) WITHOUT ROWID
(And of course not bothering with a second index on name.) The tradeoff is that inserts are likely to be slower as the chances of needing to split a page in the B-tree go up.
Some pragmas worth looking into to tune things:
cache_size
mmap_size
optimize (After creating your index; also consider building sqlite with SQLITE_ENABLE_STAT4.)
Since you don't have an INTEGER PRIMARY KEY, consider VACUUM after deleting a lot of rows if you ever do that.

truncate not always working: why?

I have defined this mapper method:
#Delete("truncate table MY_TABLE")
public void wipeAllData();
and it usually works...anyway sometimes it doesn't...is there any particular reason/known bug for that?
I'm using mybatis 3.3.0 with oracle 11g as DBMS.
EDIT
Since you added the oracle11g tag. My previous answer is no longer valid, at least not the reason why it would not be working. So I edited it.
There are some reasons that I'm aware of why sometimes it is not working in ORACLE. According to the ORACLE docs
You cannot individually truncate a table that is part of a cluster. You must either truncate the cluster, delete all rows from the table, or drop and re-create the table.
You cannot truncate the parent table of an enabled foreign key constraint. You must disable the constraint before truncating the table. An exception is that you can truncate the table if the integrity constraint is self-referential.
You cannot truncate the parent table of a reference-partitioned table. You must first drop the reference-partitioned child table.
But you should be aware that the usage or a TRUNCATE command is not ideal in an application scope. It should be an operation executed on the database only. The reason lies in another indication of the docs:
If table is not empty, then the database marks UNUSABLE all nonpartitioned indexes and all partitions of global partitioned indexes on the table. However, when the table is truncated, the index is also truncated, and a new high water mark is calculated for the index segment. This operation is equivalent to creating a new segment for the index. Therefore, at the end of the truncate operation, the indexes are once again USABLE.
So it can be a painfully long operation depending on indexes and the size of the table.
Also, for tables that have constraints the truncate operation will not drop the table, it will delete registries one by one. If you have ON DELETE CASCADE on your constraints, if not, an error will be thrown. This is still true for oracle database
Another thing will should aware of is
Removing rows with the TRUNCATE TABLE statement can be faster than removing all rows with the DELETE statement, especially if the table has numerous triggers, indexes, and other dependencies.
So if by any means you have a trigger on that table it will do nothing.
The original DOC about TRUNCATE command is here:
TRUNCATE TABLE

SQLite data retrieve with select taking too long

I have created a table with sqlite for my corona/lua app. It's a hashtable with ~=700 000 values.The table has two columns, which are the hashcode (a string), and the value (another string). During the program I need to get data several times by providing the hashcode.
I'm using something like this code to get the data:
for p in db:nrows([[SELECT * FROM test WHERE id=']].."hashcode"..[[';]]) do
print(p)
-- p = returned value --
end
This statement is though taking insanely too much time to perform
thanks,
Edit:
Success!
the mistake was with the primare key thing.I set the hashcode as the primary key like below and the retrieve time whent to normal:
CREATE TABLE IF NOT EXISTS test (id STRING PRIMARY KEY , array);
I also prepared the statements in advance as you said:
stmt = db:prepare("SELECT * FROM test WHERE id = ?;")
[...]
stmt:bind(1,s)
for p in stmt:nrows() do
The only problem was that the db file size,that was around 18 MB, went to 29,5 MB
You should create the table with id as a unique primary key; this will automatically make an index.
create table if not exists test
(
id text primary key,
val text
);
You should not construct statements using string concatenation; this is a security issue so avoid getting in this habit. Also, you should prepare statements in advance, at program initialization, and run the prepared statements.
Something like this... initially:
hashcode_query_stmt = db:prepare("SELECT * FROM test WHERE id = ?;")
then for each use:
hashcode_query_stmt:bind_values(hashcode)
for p in hashcode_query_stmt:urows() do ... end
Ensure that there is an index on the id/hashcode column? Without one such queries will be slow, slow, slow. This index should probably be unique.
If only selecting the value/hashcode (SELECT value FROM ..), it may be beneficial to have a covering index over (id, value) as that can avoid additional seeking to the row data (see SQLite Query Planning). Try it with and without such a covering index.
Also, it may be worthwhile to employ caching if the same hashcodes are queried multiple times.
As already stated, get sure you have an index on ID.
If you can't change table schema now, you can add a index ad hoc:
CREATE INDEX test_id ON test (id);
About hashes: if you are computing hashes in your software to speed up searches, don't!
SQLite will use your supplied hashes as any regular string/blob. Also, RDBMS are optimized for efficient searching, which may be greatly improved with indexes.
Unless your hashing to save space, you are wasting processor time computing hashes in your application.

Database Table Data Types To Store Key/Value Cache

I am working on a project that requires key/value caching function but the application will exist in a very limited environment that does not support any of the go-to industry standard memory caching methods such as ASP.NET Cache, memcached, AppFabric.
The only option we have in this restrictive environment is a MS SQL database. We have to create a simple key/value table to meet our key/value caching needs. We will most likely serialize the data as JSON but I am not sure what would be the best data type for the key. It will obviously need to be a unique key and need to be readable by the programmer getting and setting the cache. It also needs to be a fast look up since we will already be loosing performance not having access to an "in memory" cache solution.
I am use to having my primary key column be an int or bigint value. In this case should the primary key (the cache key) be a char or varchar data type as all queries will be:
SELECT value FROM CacheTable WHERE key = 'keyname'
I also saw posts about using an md5 hash but other posts pointed out that hashing cannot be relied on to produce unique keys all the time. I'm basically after some advice on the data type and rather or not the 'key' column should be the primary key or if I should still create an int or bigint primary key (even though it probably will not be used).
The end result we are after is creating a caching class similar to .NET's native caching where we can create a static class that pulls from the database table such as:
CustomDatabaseCache.Set(string key, object value);
CustomDatabaseCache.Get(string key)
I think in your scenario having a clustered primary key on your keyname column would work fine. However, it's worth experimenting with fill factors, because you want a fill factor that is low enough that you don't cause excessive page splits, but one that is high enough to keep the number of page reads low.
A clustered IDENTITY index works better in terms of eliminating page splits on the clustered index - and you could use a unique index on keyname which used an INCLUDE clause to include your values. However - in your case, I don't see the benefit in doing that because you'd have exactly the same page-split problem on your unique index, and the clustered index on keyname would be no more expensive to read because you wouldn't have any extra columns. Plus you would then have index update cost on two indexes for write.
Hope that helps.

Order of data in SQLite database

If I was to insert lots of rows into an empty table without primary key, nor any indexes. Varying number of rows might be inserted per transaction. Could I then be sure that a SELECT * FROM the_table; would retrieve the data in the same order on both Linux and Windows?
No, you cannot and should never rely on the order of rows in a result set from a query that does not have ordering constraints. Even on the same platform, same database. Even if it works in your tests.
Things like VACCUMing your database (or some of the auto_vaccum modes I think) could change the relative block layout of your data and alter the result set even if nothing else has changed elsewhere (no inserts, no query plan change).

Resources