sqlite optimal update after an alter table add - sqlite

I have a process where at some point I got to add a new column to a table of type INTEGER, then I got to populate this new column with an UPDATE. I do this in C.
I can lean down my code to
CREATE table t (a integer, b integer)
populate t
CREATE INDEX t_ndx on t (a)
ALTER TABLE t add c integer
The C pseudo code to update the column 'c' look like this
sqlite3_stmt u;
sqlite3_prepare_v2(db, "update t set c=? where a=?, -1, &u, 0);
for(i=0;i<n;i++)
{ c=c_a[i];
a=a_a[i];
sqlite3_bind_int64(u, 1, c);
sqlite3_bind_int64(u, 1, a);
sqlite3_step(u);
}
The order of a's are the same for this UPDATE as the one given when t was created.
I'd like to know if the sqlite3 engine detect the 'sequential' access and do speed up the "where a=?" (i.e keep a kind of caching of previous cursor ?
I'd like to know as well if there are 'hidden' feature like binding array's (at least when dealing witn INTEGERs) to avoid construct such a loop and avoid all those bindings and avoid the bytecode for doing all those insert something along the line of
sqlite3_stmt u;
sqlite3_prepare_v2(db, "update t set c=? where a=?, -1, &u, 0);
sqlite3_bind_int64_array(u, 1, c_a, n);
sqlite3_bind_int64_array(u, 1, a_a, n);
sqlite3_step_array(u,n);
Thanx in advance
Cheers
Phi

Your code already is pretty much optimal. Searching for the row by a needs a single index lookup and a single table row lookup; both will be fast because the needed pages are likely to be already cached.
You could speed up the lookup on a by making this column the INTEGER PRIMARY KEY, but this makes sense only if a actually is the primary key.
In theory, it would be possible to update multiple rows at once:
UPDATE t
SET c = CASE a
WHEN :a1 THEN :c1
WHEN :a2 THEN :c2
WHEN :a2 THEN :c3
END
WHERE a IN (:a1, :a2, :a3);
But for many a values, this is likely to be implemented as a scan over the table, so it would make sense only if you could fit all values into the query, which is not possible for a large table.

Related

Efficient insertion of row and foreign table row if it does not exist

Similar to this question and this solution for PostgreSQL (in particular "INSERT missing FK rows at the same time"):
Suppose I am making an address book with a "Groups" table and a "Contact" table. When I create a new Contact, I may want to place them into a Group at the same time. So I could do:
INSERT INTO Contact VALUES (
"Bob",
(SELECT group_id FROM Groups WHERE name = "Friends")
)
But what if the "Friends" Group doesn't exist yet? Can we insert this new Group efficiently?
The obvious thing is to do a SELECT to test if the Group exists already; if not do an INSERT. Then do an INSERT into Contacts with the sub-SELECT above.
Or I can constrain Group.name to be UNIQUE, do an INSERT OR IGNORE, then INSERT into Contacts with the sub-SELECT.
I can also keep my own cache of which Groups exist, but that seems like I'm duplicating functionality of the database in the first place.
My guess is that there is no way to do this in one query, since INSERT does not return anything and cannot be used in a subquery. Is that intuition correct? What is the best practice here?
My guess is that there is no way to do this in one query, since INSERT
does not return anything and cannot be used in a subquery. Is that
intuition correct?
You could use a Trigger and a little modification of the tables and then you could do it with a single query.
For example consider the folowing
Purely for convenience of producing the demo:-
DROP TRIGGER IF EXISTS add_group_if_not_exists;
DROP TABLE IF EXISTS contact;
DROP TABLE IF EXISTS groups;
One-time setup SQL :-
CREATE TABLE IF NOT EXISTS groups (id INTEGER PRIMARY KEY, group_name TEXT UNIQUE);
INSERT INTO groups VALUES(-1,'NOTASSIGNED');
CREATE TABLE IF NOT EXISTS contact (id INTEGER PRIMARY KEY, contact TEXT, group_to_use TEXT, group_reference TEXT DEFAULT -1 REFERENCES groups(id));
CREATE TRIGGER IF NOT EXISTS add_group_if_not_exists
AFTER INSERT ON contact
BEGIN
INSERT OR IGNORE INTO groups (group_name) VALUES(new.group_to_use);
UPDATE contact SET group_reference = (SELECT id FROM groups WHERE group_name = new.group_to_use), group_to_use = NULL WHERE id = new.id;
END;
SQL that would be used on an ongoing basis :-
INSERT INTO contact (contact,group_to_use) VALUES
('Fred','Friends'),
('Mary','Family'),
('Ivan','Enemies'),
('Sue','Work colleagues'),
('Arthur','Fellow Rulers'),
('Amy','Work colleagues'),
('Henry','Fellow Rulers'),
('Canute','Fellow Ruler')
;
The number of values and the actual values would vary.
SQL Just for demonstration of the result
SELECT * FROM groups;
SELECT contact,group_name FROM contact JOIN groups ON group_reference = groups.id;
Results
This results in :-
1) The groups (noting that the group "NOTASSIGNED", is intrinsic to the working of the above and hence added initially) :-
have to be careful regard mistakes like (Fellow Ruler instead of Fellow Rulers)
-1 used because it would not be a normal value automatically generated.
2) The contacts with the respective group :-
Efficient insertion
That could likely be debated from here to eternity so I leave it for the fence sitters/destroyers to decide :). However, some considerations:-
It works and appears to do what is wanted.
It's a little wasteful due to the additional wasted column.
It tries to minimise the waste by changing the column to an empty string (NULL may be even more efficient, but for some can be confusing)
There will obviously be an overhead BUT in comparison to the alternatives probably negligible (perhaps important if you were extracting every Facebook user) but if it's user input driven likely irrelevant.
What is the best practice here?
Fences again. :)
Note Hopefully obvious, but the DROP statements are purely for convenience and that all other SQL up until the INSERT is run once
to setup the tables and triggers in preparation for the single INSERT
that adds a group if necessary.

How can I return inserted ids for multiple rows in SQLite?

Given a table:
CREATE TABLE Foo(
Id INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT
);
How can I return the ids of the multiple rows inserted at the same time using:
INSERT INTO Foo (Name) VALUES
('A'),
('B'),
('C');
I am aware of last_insert_rowid() but I have not found any examples of using it for multiple rows.
What I am trying to achieve can bee seen in this SQL Server example:
DECLARE #InsertedRows AS TABLE (Id BIGINT);
INSERT INTO [Foo] (Name) OUTPUT Inserted.Id INTO #InsertedRows VALUES
('A'),
('B'),
('C');
SELECT Id FROM #InsertedRows;
Any help is very much appreciated.
This is not possible. If you want to get three values, you have to execute three INSERT statements.
Given SQLite3 locking:
An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held.
And how Last Insert Rowid works:
...returns the rowid of the most recent successful INSERT into a rowid table or virtual table on database connection D.
It should be safe to assume that while a writer executes its batch INSERT to a ROWID-table there can be no other writer to make the generated primary keys non-consequent. Thus the insert primary keys are [lastrowid - rowcount + 1, lastrowid]. Or in Python SQLite3 API:
cursor.execute(...) # multi-VALUE INSERT
assert cursor.rowcount == len(values)
lastrowids = range(cursor.lastrowid - cursor.rowcount + 1, cursor.lastrowid + 1)
In normal circumstances when you don't mix provided and expected-to-be-generated keys or as AUTOINCREMENT-mode documentation states:
The normal ROWID selection algorithm described above will generate monotonically increasing unique ROWIDs as long as you never use the maximum ROWID value and you never delete the entry in the table with the largest ROWID.
The above should work as expected.
This Python script can be used to test correctness of the above for multi-threaded and multi-process setup.
Other databases
For instance, MySQL InnoDB (at least in default innodb_autoinc_lock_mode = 1 "consecutive" lock mode) works in similar way (though obviously in much more concurrent conditions) and guarantees that inserted PKs can be inferred from lastrowid:
"Simple inserts" (for which the number of rows to be inserted is known in advance) avoid table-level AUTO-INC locks by obtaining the required number of auto-increment values under the control of a mutex (a light-weight lock) that is only held for the duration of the allocation process, not until the statement completes

sqlite3 autoincrement - am I missing something?

I want to create unique order numbers for each day. So ideally, in PostgreSQL for instance, I could create a sequence and read it back for these unique numbers, because the readback both gets me the new number and is atomic. Then at close of day, I'd reset the sequence.
In sqlite3, however, I only see an autoincrement for the integer field type. So say I set up a table with an autoincrement field, and insert a record to get the new number (seems like an awfully inefficient way to do it, but anyway...) When I go to read the max back, who is to say that another task hasn't gone in there and inserted ANOTHER record, thereby causing me to read back a miss, with my number one too far advanced (and a duplicate of what the other task reads back.)
Conceptually, I require:
fast lock with wait for other tasks
increment number
retrieve number
unlock
...I just don't see how to do that with sqlite3. Can anyone enlighten me?
In SQLite, autoincrementing fields are intended to be used as actual primary keys for their records.
You should just it as the ID for your orders table.
If you really want to have an atomic counter independent of corresponding table records, use a table with a single record.
ACID is ensured with transactions:
BEGIN;
SELECT number FROM MyTable;
UPDATE MyTable SET number = ? + 1;
COMMIT;
ok, looks like sqlite either doesn't have what I need, or I am missing it. Here's what I came up with:
declare zorder as integer primary key autoincrement, zuid integer in orders table
this means every new row gets an ascending number, starting with 1
generate a random number:
rnd = int(random.random() * 1000000) # unseeded python uses system time
create new order (just the SQL for simplicity):
'INSERT INTO orders (zuid) VALUES ('+str(rnd)+')'
find that exact order number using the random number:
'SELECT zorder FROM orders WHERE zuid = '+str(rnd)
pack away that number as the new order number (newordernum)
clobber the random number to reduce collision risks
'UPDATE orders SET zuid = 0 WHERE zorder = '+str(newordernum)
...and now I have a unique new order, I know what the correct order number is, the risk of a read collision is reduced to negligible, and I can prepare that order without concern that I'm trampling on another newly created order.
Just goes to show you why DB authors implement sequences, lol.

Obtaining Read/Write Lock for Sqlite Parallel Updates

I write a lot of parallel scripts in python for research purposes and was wondering if it is possible to obtain a read/write lock manually in sqlite for a specific set of commands. Here is an oversimplified example of why I need it:
Simple Example:
Suppose I have a table (A) and I want to simply count the number of rows in it and store the result in another table (B) in parallel. To do so, I run 10 instances of a program that counts rows for a certain range of A and adds the sum of the rows to the property in B.
The problem is that I need to read the property in B, add it to the instance's count, and save it; all while making sure none of the other instances are doing this process. Normally it is only a write lock that is needed - in this case I need a read lock as well...
I was hoping I could do something like this:
read/write lock
Grab current value in B, add it to the
instance count
save new value
unlock
Is there a way to do this?
Thanks.
You can increment an integer value using a single UPDATE statement:
sqlite> CREATE TABLE B(Id INTEGER, Value INTEGER);
sqlite> INSERT INTO B VALUES(0, 15);
sqlite> UPDATE B SET Value=Value + 23 WHERE Id = 0;
sqlite> SELECT * FROM B;
0|38
sqlite>
Using a single UPDATE statement makes this operation atomic, making any extra locking unnecessary.
If you need more complex processing, you can use SQL transactions to ensure that any complex database operations are performed atomically.
In general, you should avoid any locking external to SQLite or messing with the SQLite locking subsystem - doing so is a very good recipe for deadlocks...
EDIT:
To append to a string, you can use the || concatenation operator:
sqlite> CREATE TABLE C(Id INTEGER, Value TEXT);
sqlite> INSERT INTO C VALUES(0, 'X');
sqlite> UPDATE C SET Value=Value || 'Y' WHERE Id = 0;
sqlite> SELECT * FROM C;
0|XY
sqlite>

Using SQLite how do I index columns in a CREATE TABLE statement?

How do I index a column in my CREATE TABLE statement? The table looks like
command.CommandText =
"CREATE TABLE if not exists file_hash_list( " +
"id INTEGER PRIMARY KEY, " +
"hash BLOB NOT NULL, " +
"filesize INTEGER NOT NULL);";
command.ExecuteNonQuery();
I want filesize to be index and would like it to be 4 bytes
You can't do precisely what you're asking, but unlike some RDBMSs, SQLite is able to perform DDL inside of a transaction, with the appropriate result. This means that if you're really concerned about nothing ever seeing file_hash_list without an index, you can do
BEGIN;
CREATE TABLE file_hash_list (
id INTEGER PRIMARY KEY,
hash BLOB NOT NULL,
filesize INTEGER NOT NULL
);
CREATE INDEX file_hash_list_filesize_idx ON file_hash_list (filesize);
COMMIT;
or the equivalent using the transaction primitives of whatever database library you've got there.
I'm not sure how necessary that really is though, compared to just doing the two commands outside of a transaction.
As others have pointed out, SQLite's indexes are all B-trees; you don't get to choose what the type is or what portion of the column is indexed; the entire indexed column(s) are in the index. It will still be efficient for range queries, it just might take up a little more disk space than you'd really like.
I don't think you can. Why can't you use a second query? And specifying the size of the index seems to be impossible in SQLite. But then again, it is SQ_Lite_ ;)
create index
myIndex
on
myTable (myColumn)

Resources