Improving SQLite Query Performance - sqlite

I have run the following query in SQLite and SQLServer. On SQLite the query has never finished runing - i have let it sit for hours and still continues to run. On SQLServer it takes a little less than a minute to run. The table has several hundred thousands of records. Is there a way to improve the performance of the query in SQLite?
update tmp_tbl
set prior_symbol = (select o.symbol
from options o
where o.underlying_ticker = tmp_tbl.underlying_ticker
and o.option_type = tmp_tbl.option_type
and o.expiration = tmp_tbl.expiration
and o.strike = (select max(o2.strike)
from options o2
where o2.underlying_ticker = tmp_tbl.underlying_ticker
and o2.option_type = tmp_tbl.option_type
and o2.expiration = tmp_tbl.expiration
and o2.strike < tmp_tbl.strike));
Update: I was able to get what I needed done using some python code and handling the data mapping outside of SQL. However, I am puzzled by the performance difference between SQLite and SQLServer - I was expecting SQLite to be much faster.
When I ran the above query initially, neither table had any indexes other than a standard primary key, id, which is unrelated to the data. I created two indexes as follows:
create index options_table_index on options(underlying_ticker, option_type, expiration, strike);
and:
create index tmp_tbl_index on tmp_tbl(underlying_ticker, option_type, expiration, strike);
But that didn't help. The query still continues to clock without any output - I let it run for nearly 40 minutes.
The table definition for tmp_tbl is:
create table tmp_tbl(id integer primary key,
symbol text,
underlying_ticker text,
option_type text,
strike real,
expiration text,
mid real,
prior_symbol real,
prior_premium real,
ratio real,
error_flag bit);
The definition of options table is similar but with a few more fields.

Related

How can I return inserted ids for multiple rows in SQLite?

Given a table:
CREATE TABLE Foo(
Id INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT
);
How can I return the ids of the multiple rows inserted at the same time using:
INSERT INTO Foo (Name) VALUES
('A'),
('B'),
('C');
I am aware of last_insert_rowid() but I have not found any examples of using it for multiple rows.
What I am trying to achieve can bee seen in this SQL Server example:
DECLARE #InsertedRows AS TABLE (Id BIGINT);
INSERT INTO [Foo] (Name) OUTPUT Inserted.Id INTO #InsertedRows VALUES
('A'),
('B'),
('C');
SELECT Id FROM #InsertedRows;
Any help is very much appreciated.
This is not possible. If you want to get three values, you have to execute three INSERT statements.
Given SQLite3 locking:
An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held.
And how Last Insert Rowid works:
...returns the rowid of the most recent successful INSERT into a rowid table or virtual table on database connection D.
It should be safe to assume that while a writer executes its batch INSERT to a ROWID-table there can be no other writer to make the generated primary keys non-consequent. Thus the insert primary keys are [lastrowid - rowcount + 1, lastrowid]. Or in Python SQLite3 API:
cursor.execute(...) # multi-VALUE INSERT
assert cursor.rowcount == len(values)
lastrowids = range(cursor.lastrowid - cursor.rowcount + 1, cursor.lastrowid + 1)
In normal circumstances when you don't mix provided and expected-to-be-generated keys or as AUTOINCREMENT-mode documentation states:
The normal ROWID selection algorithm described above will generate monotonically increasing unique ROWIDs as long as you never use the maximum ROWID value and you never delete the entry in the table with the largest ROWID.
The above should work as expected.
This Python script can be used to test correctness of the above for multi-threaded and multi-process setup.
Other databases
For instance, MySQL InnoDB (at least in default innodb_autoinc_lock_mode = 1 "consecutive" lock mode) works in similar way (though obviously in much more concurrent conditions) and guarantees that inserted PKs can be inferred from lastrowid:
"Simple inserts" (for which the number of rows to be inserted is known in advance) avoid table-level AUTO-INC locks by obtaining the required number of auto-increment values under the control of a mutex (a light-weight lock) that is only held for the duration of the allocation process, not until the statement completes

Sqlite slow but barely using machine ressources

I have a 500MB sqlite database of about 5 million rows with the following schema:
CREATE TABLE my_table (
id1 VARCHAR(12) NOT NULL,
id2 VARCHAR(3) NOT NULL,
date DATE NOT NULL,
val1 NUMERIC,
val2 NUMERIC,
val2 NUMERIC,
val4 NUMERIC,
val5 INTEGER,
PRIMARY KEY (id1, id2, date)
);
I am trying to run:
SELECT count(ROWID) FROM my_table
The query has now been running for several minutes which seems excessive to me. I am aware that sqlite is not optimized for count(*)-type queries.
I could accept this if at least my machine appeared to be hard at work. However, my CPU load hovers somewhere around 0-1%. "Disk Delta Total Bytes" in Process Explorer is about 500.000.
Any idea if this can be sped up?
You should have an index for any fields you query on like this. create index tags_index on tags(tag);. Then, I am sure definitely the query will be faster. Secondly, try to normalize your table and have a test (without having an index). Compare the results.
In most cases, count(*) would be faster than count(rowid).
If you have a (non-partial) index, computing the row count can be done faster with that because less data needs do be loaded from disk.
In this case, the primary key constraint already has created such an index.
I would try to look at my disk IO if I were you. I guess they are quite high. Considering the size of your database some data must be on the disk which makes it the bottleneck.
Two ideas from my rudimentary knowledge of SQLite.
Idea 1: If memory is not a problem in your case and your application is launched once and run several queries, I would try to increase the amount of cache used (there's a cache_size pragma available). After a few googling I found this link about SQLite tweaking: http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html
Idea 2: I would try to have an autoincremented primary key (on a single column) and try to tweak my request using SELECT COUNT(DISTINCT row_id) FROM my_table; . This could force the counting to be only run on what's contained in the index.

SQLite data retrieve with select taking too long

I have created a table with sqlite for my corona/lua app. It's a hashtable with ~=700 000 values.The table has two columns, which are the hashcode (a string), and the value (another string). During the program I need to get data several times by providing the hashcode.
I'm using something like this code to get the data:
for p in db:nrows([[SELECT * FROM test WHERE id=']].."hashcode"..[[';]]) do
print(p)
-- p = returned value --
end
This statement is though taking insanely too much time to perform
thanks,
Edit:
Success!
the mistake was with the primare key thing.I set the hashcode as the primary key like below and the retrieve time whent to normal:
CREATE TABLE IF NOT EXISTS test (id STRING PRIMARY KEY , array);
I also prepared the statements in advance as you said:
stmt = db:prepare("SELECT * FROM test WHERE id = ?;")
[...]
stmt:bind(1,s)
for p in stmt:nrows() do
The only problem was that the db file size,that was around 18 MB, went to 29,5 MB
You should create the table with id as a unique primary key; this will automatically make an index.
create table if not exists test
(
id text primary key,
val text
);
You should not construct statements using string concatenation; this is a security issue so avoid getting in this habit. Also, you should prepare statements in advance, at program initialization, and run the prepared statements.
Something like this... initially:
hashcode_query_stmt = db:prepare("SELECT * FROM test WHERE id = ?;")
then for each use:
hashcode_query_stmt:bind_values(hashcode)
for p in hashcode_query_stmt:urows() do ... end
Ensure that there is an index on the id/hashcode column? Without one such queries will be slow, slow, slow. This index should probably be unique.
If only selecting the value/hashcode (SELECT value FROM ..), it may be beneficial to have a covering index over (id, value) as that can avoid additional seeking to the row data (see SQLite Query Planning). Try it with and without such a covering index.
Also, it may be worthwhile to employ caching if the same hashcodes are queried multiple times.
As already stated, get sure you have an index on ID.
If you can't change table schema now, you can add a index ad hoc:
CREATE INDEX test_id ON test (id);
About hashes: if you are computing hashes in your software to speed up searches, don't!
SQLite will use your supplied hashes as any regular string/blob. Also, RDBMS are optimized for efficient searching, which may be greatly improved with indexes.
Unless your hashing to save space, you are wasting processor time computing hashes in your application.

sqlite3 autoincrement - am I missing something?

I want to create unique order numbers for each day. So ideally, in PostgreSQL for instance, I could create a sequence and read it back for these unique numbers, because the readback both gets me the new number and is atomic. Then at close of day, I'd reset the sequence.
In sqlite3, however, I only see an autoincrement for the integer field type. So say I set up a table with an autoincrement field, and insert a record to get the new number (seems like an awfully inefficient way to do it, but anyway...) When I go to read the max back, who is to say that another task hasn't gone in there and inserted ANOTHER record, thereby causing me to read back a miss, with my number one too far advanced (and a duplicate of what the other task reads back.)
Conceptually, I require:
fast lock with wait for other tasks
increment number
retrieve number
unlock
...I just don't see how to do that with sqlite3. Can anyone enlighten me?
In SQLite, autoincrementing fields are intended to be used as actual primary keys for their records.
You should just it as the ID for your orders table.
If you really want to have an atomic counter independent of corresponding table records, use a table with a single record.
ACID is ensured with transactions:
BEGIN;
SELECT number FROM MyTable;
UPDATE MyTable SET number = ? + 1;
COMMIT;
ok, looks like sqlite either doesn't have what I need, or I am missing it. Here's what I came up with:
declare zorder as integer primary key autoincrement, zuid integer in orders table
this means every new row gets an ascending number, starting with 1
generate a random number:
rnd = int(random.random() * 1000000) # unseeded python uses system time
create new order (just the SQL for simplicity):
'INSERT INTO orders (zuid) VALUES ('+str(rnd)+')'
find that exact order number using the random number:
'SELECT zorder FROM orders WHERE zuid = '+str(rnd)
pack away that number as the new order number (newordernum)
clobber the random number to reduce collision risks
'UPDATE orders SET zuid = 0 WHERE zorder = '+str(newordernum)
...and now I have a unique new order, I know what the correct order number is, the risk of a read collision is reduced to negligible, and I can prepare that order without concern that I'm trampling on another newly created order.
Just goes to show you why DB authors implement sequences, lol.

Sqlite3: Disabling primary key index while inserting?

I have an Sqlite3 database with a table and a primary key consisting of two integers, and I'm trying to insert lots of data into it (ie. around 1GB or so)
The issue I'm having is that creating primary key also implicitly creates an index, which in my case bogs down inserts to a crawl after a few commits (and that would be because the database file is on NFS.. sigh).
So, I'd like to somehow temporary disable that index. My best plan so far involved dropping the primary key's automatic index, however it seems that SQLite doesn't like it and throws an error if I attempt to do it.
My second best plan would involve the application making transparent copies of the database on the network drive, making modifications and then merging it back. Note that as opposed to most SQlite/NFS questions, I don't need access concurrency.
What would be a correct way to do something like that?
UPDATE:
I forgot to specify the flags I'm already using:
PRAGMA synchronous = OFF
PRAGMA journal_mode = OFF
PRAGMA locking_mode = EXCLUSIVE
PRAGMA temp_store = MEMORY
UPDATE 2:
I'm in fact inserting items in batches, however every next batch is slower to commit than previous one (I'm assuming this has to do with the size of index). I tried doing batches of between 10k and 50k tuples, each one being two integers and a float.
You can't remove embedded index since it's the only address of row.
Merge your 2 integer keys in single long key = (key1<<32) + key2; and make this as a INTEGER PRIMARY KEY in youd schema (in that case you will have only 1 index)
Set page size for new DB at least 4096
Remove ANY additional index except primary
Fill in data in the SORTED order so that primary key is growing.
Reuse commands, don't create each time them from string
Set page cache size to as much memory as you have left (remember that cache size is in number of pages, but not number of bytes)
Commit every 50000 items.
If you have additional indexes - create them only AFTER ALL data is in table
If you'll be able to merge key (I think you're using 32bit, while sqlite using 64bit, so it's possible) and fill data in sorted order I bet you will fill in your first Gb with the same performance as second and both will be fast enough.
Are you doing the INSERT of each new as an individual Transaction?
If you use BEGIN TRANSACTION and INSERT rows in batches then I think the index will only get rebuilt at the end of each Transaction.
See faster-bulk-inserts-in-sqlite3.

Resources