alter table and insert perf (serial read/write detection) - sqlite

I have a table 't' with an integer primary key 'k', I'd like to alter the table 't' to add one more column 'c' then fill the 'c' column in a C language iteration loop.
I extracted all the primary keys in a array then plan to prepare() an UPDATE query like this (pseudo code)
q=prepare("UPDATE t set c=? where k=?");
for(i=0;i<n;i++)
{ x=k[i];
bind(q,0,cvalue); bind(q,1,x);
step(q);
...
}
Since the key 'x' that will be presented to the UPDATE query are the primary key in sequence, does SQLITE clever enough to avoid a bsearch on the k=?, in other word, does it detect the 'serial' write access.
Thanx in advance.
Cheers,
Phi

Related

Converting a field to lower case and merging data in an sqlite database

I need to merge some randomly uppercased data that has been collected in an SQLite table key_val, such that key is always lowercase and no vals are lost. There is a unique compound index on key,val.
The initial data looks like this:
key|val
abc|1
abc|5
aBc|1
aBc|5
aBc|3
aBc|2
AbC|1
abC|3
The result after the merge would be
key|val
abc|1
abc|2
abc|3
abc|5
In my programmer brain, I would
for each `key` with upper case letters;
if a lower cased `key` is found with the same value
then delete `key`
else update `key` to lower case
Re implementing the loop has a sub query for each row found with upper case letters, to check if the val already exists as a lower case key
If it does, I can delete the cased key.
From there I can UPDATE key = lower(key) as the "duplicates" have been removed.
The first cut of the programming method of finding the dupes is:
SELECT * FROM key_val as parent
WHERE parent.key != lower(parent.key)
AND 0 < (
SELECT count(s.val) FROM key_val as s
WHERE s.key = lower(parent.key) AND s.val = parent.val
)
ORDER BY parent.key DESC;
I'm assuming there's a better way to do this in SQLite? The ON CONFLICT functionality seems to me like it should be able to handle the dupe deletion on UPDATE but I'm not seeing it.
First delete all the duplicates:
DELETE FROM key_val AS k1
WHERE EXISTS (
SELECT 1
FROM key_val AS k2
WHERE LOWER(k2.key) = LOWER(k1.key) AND k2.val = k1.val AND k2.rowid < k1.rowid
);
by keeping only 1 combination of key and val with the min rowid.
It is not important if you kept the key with all lower chars or not, because the 2nd step is to update the table:
UPDATE key_val
SET key = LOWER(key);
See the demo.
Honestly it might just be easier to create a new table and then insert into it. As it seems you really just want a distinct select here, use:
INSERT INTO kev_val_new ("key", val)
SELECT DISTINCT LOWER("key"), val
FROM key_val;
Once you have populated the new table, you may drop the old one, and then rename the new one to the previous name:
DROP TABLE key_val;
ALTER TABLE key_val_new RENAME TO key_val;
I agree with #Tim that it would be easire to re-create table using simple select distict lower().. statement, but that's not always easy if table has dependant objects (indexes, triggers, views). In this case this can be done as sequence of two steps:
insert lowered keys which are not still there:
insert into t
select distinct lower(tr.key) as key, tr.val
from t as tr
left join t as ts on ts.key = lower(tr.key) and ts.val = tr.val
where ts.key is null;
now when we have all lowered keys - remove other keys:
delete from t where key <> lower(key);
See fiddle: http://sqlfiddle.com/#!5/84db50/11
However this method assumes that key is always populated (otherwise it would be a strange key)
If vals can be null then "ts.val = tr.val" should be replaced with more complex stuff like ifnull(ts.val, -1) = ifnull(tr.val, -1) where -1 is some unused value (can be different). If we can't assume any unused value like -1 then it should be more complex check for null / not null cases.

SQLite Query plan

Is there a way to manipulate the query plan generated in SQLite?
I 'l try to explain my problem:
I have 3 tables:
CREATE TABLE "index_term" (
"id" INT,
"term" VARCHAR(255) NOT NULL,
PRIMARY KEY("id"),
UNIQUE("term"));
CREATE TABLE "index_posting" (
"doc_id" INT NOT NULL,
"term_id" INT NOT NULL,
PRIMARY KEY("doc_id", "field_id", "term_id"),,
CONSTRAINT "index_posting_doc_id_fkey" FOREIGN KEY ("doc_id")
REFERENCES "document"("doc_id") ON DELETE CASCADE,
CONSTRAINT "index_posting_term_id_fkey" FOREIGN KEY ("term_id")
REFERENCES "index_term"("id") ON DELETE CASCADE);;
CREATE INDEX "index_posting_term_id_idx" ON "index_posting"("term_id");
CREATE TABLE "published_files" (
"doc_id" INTEGER NOT NULL,,
"uri_id" INTEGER,
"user_id" INTEGER NOT NULL,
"status" INTEGER NOT NULL,
"title" VARCHAR(1024),
PRIMARY KEY("uri_id"));
CREATE INDEX "published_files_doc_id_idx" ON "published_files"("doc_id");
about 600.000 entries in the index_term, about 4 Millions in the index_posting and 300.000 in the published_files table.
Now when i want to find the number of unique doc_ids in index_posting which reference some terms i use the following SQL.
select count(distinct index_posting.doc_id) from index_term, index_posting
where
index_posting.term_id = index_term.id and index_term.term like '%test%'
The result is displayed in reasonable time (0.3 secs). Asking Explain Query plan returns
0|0|0|SCAN TABLE index_term
0|1|1|SEARCH TABLE index_posting USING INDEX index_posting_term_id_idx (term_id=?)
When i want to filter the count in the way that it only includes doc_ids of index_posting if there exists a published_files entry:
select count(distinct index_posting.doc_id) from index_term, index_posting,
published_files where
index_posting.term_id = index_term.id and index_posting.doc_id = published_files.doc_id and index_term.term like '%test%'
The query takes almost 10 times as long. Asking Explain Query plan returns
0|0|1|SCAN TABLE index_posting
0|1|0|SEARCH TABLE index_term USING INDEX sqlite_autoindex_index_term_1 (id=?)
0|2|2|SEARCH TABLE published_files AS pf USING COVERING INDEX published_files_doc_id_idx (doc_id=?)
So as far as i understand SQLITE changed here its query plan doing a full table scan of index_posting and a lookup in index_term instead of the other way around.
As a workaround i did do a
analyze index_posting;
analyze index_term;
analyze published_files;
and now it seems correct,
0|0|0|SCAN TABLE index_term
0|1|1|SEARCH TABLE index_posting USING INDEX index_posting_term_id_idx (term_id=?)
0|2|2|SEARCH TABLE published_files USING COVERING INDEX published_files_doc_id_idx (doc_id=?)
but my question is - is there a way to force SQLITE to always use the correct query plan?
TIA
ANALYZE is not a workaround; it's supposed to be used.
You can use CROSS JOIN to enforce a certain order of the nested loops, or use INDEXED BY to force a certain index to be used.
However, you asked for "the correct query plan", which might not be same as the one enforced by these mechanisms.

Will AutoIncrement work with Check constraints?

The question is simple:
In SQLite, if I choose to AutoIncrement a primary key of type NUMERIC which has a check constraint like CHECK(LENGTH(ID) == 10), will it work correctly inserting the first value as 0000000001 and so on?
No, that does not work. Adding a check does not magically also add a way of fullfilling the check to insert the data.
See this SQLFiddle.
If you want to restrict the value of an autoincrement column like that, you need to seed the internal sequence table. (There are other ways.)
create table foo (
foo_id integer primary key autoincrement,
other_columns char(1) default 'x',
check (length(foo_id) = 10 )
);
insert into sqlite_sequence values ('foo', 999999999);
Application code is allowed to modify the sqlite_sequence table, to
add new rows, to delete rows, or to modify existing rows.
Source
insert into foo (other_columns) values ('a');
select * from foo;
1000000000|a
Trying to insert 11 digits makes the CHECK constraint fail.
insert into foo values (12345678901, 'a');
Error: CHECK constraint failed: foo
One alternative is to insert a "fake" row with the first valid id number immediately after creating the table. Then delete it.
create table foo(...);
insert into foo values (1000000000, 'a');
delete from foo;
Now you can insert normally.
insert into foo (other_columns) values ('b');
select * from foo;
1000000001|b
In fact the ID's length is 1, so it doesn't work.

Strange SQLite behavior: Not returning results on simple queries

Ok, so I have a basic table called "ledger", it contains fields of various types, integers, varchar, etc.
In my program, I used to use a query with no "from" predicate to collect all of the rows, which of course works fine. But... I changed my code to allow selecting one row at a time using "where acctno = x" (where X is the account number I want to select at the time).
I thought this must be a bug in the client library for my programming language, so I tested it in the SQLite command-line client - and it still doesn't work!
I am relatively new to SQLite, but I have been using Oracle, MS SQL Server, etc. for years and never seen this type of issue before.
Other things I can tell you:
* Queries using other integer fields also don't work
* Queries on char fields work
* Querying it as a string (with the account number on quotes) still doesn't work. (I thought maybe the numbers were stored as a string inadvertently).
* Accessing rows by rowid works fine - which is why I can edit the database with GUI tools with no noticeable problem.
Examples:
Query with no WHERE (works fine):
1|0|0|JPY|8|Paid-In Capital|C|X|0|X|0|0||||0|0|0|
0|0|0|JPY|11|Root Account|P|X|0|X|0|0|SYSTEM|20121209|150000|0|0|0|
3|0|0|JPY|13|Mitsubishi Bank Futsuu|A|X|0|X|0|0|SYSTEM|20121209|150000|0|0|0|
4|0|0|JPY|14|Japan Post Bank|A|X|0|X|0|0|SYSTEM|20121209|150000|0|0|0|
...
Query with WHERE clause: (no results)
sqlite> select * from ledger where acctno=1;
sqlite>
putting quotes around the 1 above changes nothing.
Interestingly enough, "select * from ledger where acctno > 1" returns results! However since it returns ALL results, it's not terrible useful.
I'm sure someone will ask about the table structure, so here goes:
sqlite> .schema ledger
CREATE TABLE "LEDGER" (
"ACCTNO" integer(10,0) NOT NULL,
"drbal" integer(20,0) NOT NULL,
"crbal" integer(20,0) NOT NULL,
"CURRKEY" char(3,0) NOT NULL,
"TEXTKEY" integer(10,0),
"TEXT" VARCHAR(64,0),
"ACCTYPECD" CHAR(1,0) NOT NULL,
"ACCSTCD" CHAR(1,0),
"PACCTNO" number(10,0) NOT NULL,
"CATCD" number(10,0),
"TRANSNO" number(10,0) NOT NULL,
"extrefno" number(10,0),
"UPDATEUSER" VARCHAR(32,0),
"UPDATEDATE" text(8,0),
"UPDATETIME" TEXT(6,0),
"PAYEECD" number(10,0) NOT NULL,
"drbal2" number(10,0) NOT NULL,
"crbal2" number(10,0) NOT NULL,
"delind" boolean,
PRIMARY KEY("ACCTNO"),
CONSTRAINT "fk_curr" FOREIGN KEY ("CURRKEY") REFERENCES "CURRENCY" ("CUR
RKEY") ON DELETE RESTRICT ON UPDATE CASCADE
);
The strangest thing is that I have other similar tables where this works fine!
sqlite> select * from journalhdr where transno=13;
13|Test transaction ATM Withdrawel 20130213|20130223||20130223||
TransNo in that table is also integer (10,0) NOT NULL - this is what makes me thing it is something to do with the values.
Another clue is that the sort order seems to be based on ascii, not numeric:
sqlite> select * from ledger order by acctno;
0|0|0|JPY|11|Root Account|P|X|0|X|0|0|SYSTEM|20121209|150000|0|0|0|
1|0|0|JPY|8|Paid-In Capital|C|X|0|X|0|0||||0|0|0|
10|0|0|USD|20|Sallie Mae|L|X|0|X|0|0|SYSTEM|20121209|153900|0|0|0|
21|0|0|USD|21|Skrill|A|X|0|X|0|0|SYSTEM|20121209|154000|0|0|0|
22|0|0|USD|22|AES|L|X|0|X|0|0|SYSTEM|20121209|154200|0|0|0|
23|0|0|JPY|23|Marui|L|X|0|X|0|0|SYSTEM|20121209|154400|0|0|0|
24|0|0|JPY|24|Amex JP|L|X|0|X|0|0|SYSTEM|20121209|154500|0|0|0|
3|0|0|JPY|13|Mitsubishi Bank Futsuu|A|X|0|X|0|0|SYSTEM|20121209|150000|0|0|0|
Of course the sort order on journalhdr (where the select works properly) is numeric.
Solved! (sort-of)
The data can be fixed like this:
sqlite> update ledger set acctno = 23 where rowid = 13;
sqlite> select * from ledger where acctno = 25;
25|0|0|JPY|0|Test|L|X|0|X|0|0|SYSTEM|20130224|132500|0|0|0|
Still, if it was stored as strings, then that leave a few questions:
1. Why couldn't I select it as a string using the quotes?
2. How did it get stored as a string since it is a valid integer?
3. How would you go about detecting this problem normally besides noticing bizzarre symptoms?
Although the data would normally be entered by my program, some of it was created by hand using Navicat, so I assume the problem must lie there.
You are victim of SQLite dynamic typing.
Even though SQLite defines system of type affinity, which sets some rules on how input strings or numbers will be converted to actual internal values, but it does NOT prevent software that is using prepared statements to explicitly set any type (and data value) for the column (and this can be different per row!).
This can be shown by this simple example:
CREATE TABLE ledger (acctno INTEGER, name VARCHAR(16));
INSERT INTO ledger VALUES(1, 'John'); -- INTEGER '1'
INSERT INTO ledger VALUES(2 || X'00', 'Zack'); -- BLOB '2\0'
I have inserted second row not as INTEGER, but as binary string containing embedded zero byte. This reproduces your issue exactly, see this SQLFiddle, step by step. You can also execute these commands in sqlite3, you will get the same result.
Below is Perl script that also reproduces this issue
This script creates just 2 rows with acctno having values of integer 1 for first, and "2\0" for second row. "2\0" means string consisting of 2 bytes: first is digit 2, and second is 0 (zero) byte.
Of course, it is very difficult to visually tell "2\0" from just "2", but this is what script below demonstrates:
#!/usr/bin/perl -w
use strict;
use warnings;
use DBI qw(:sql_types);
my $dbh = DBI->connect("dbi:SQLite:test.db") or die DBI::errstr();
$dbh->do("DROP TABLE IF EXISTS ledger");
$dbh->do("CREATE TABLE ledger (acctno INTEGER, name VARCHAR(16))");
my $sth = $dbh->prepare(
"INSERT INTO ledger (acctno, name) VALUES (?, ?)");
$sth->bind_param(1, "1", SQL_INTEGER);
$sth->bind_param(2, "John");
$sth->execute();
$sth->bind_param(1, "2\0", SQL_BLOB);
$sth->bind_param(2, "Zack");
$sth->execute();
$sth = $dbh->prepare(
"SELECT count(*) FROM ledger WHERE acctno = ?");
$sth->bind_param(1, "1");
$sth->execute();
my ($num1) = $sth->fetchrow_array();
print "Number of rows matching id '1' is $num1\n";
$sth->bind_param(1, "2");
$sth->execute();
my ($num2) = $sth->fetchrow_array();
print "Number of rows matching id '2' is $num2\n";
$sth->bind_param(1, "2\0", SQL_BLOB);
$sth->execute();
my ($num3) = $sth->fetchrow_array();
print "Number of rows matching id '2<0>' is $num3\n";
Output of this script is:
Number of rows matching id '1' is 1
Number of rows matching id '2' is 0
Number of rows matching id '2<0>' is 1
If you were to look at resultant table using any SQLite tool (including sqlite3), it will print 2 for second row - they all get confused by trailing 0 inside a BLOB when it gets coerced to string or number.
Note that I had to use custom param binding to coerce type to BLOB and permit null bytes stored:
$sth->bind_param(1, "2\0", SQL_BLOB);
Long story short, it is either some of your client programs, or some of client tools like Navicat which screwed it up.

INSERT IF NOT EXISTS ELSE UPDATE?

I've found a few "would be" solutions for the classic "How do I insert a new record or update one if it already exists" but I cannot get any of them to work in SQLite.
I have a table defined as follows:
CREATE TABLE Book
ID INTEGER PRIMARY KEY AUTOINCREMENT,
Name VARCHAR(60) UNIQUE,
TypeID INTEGER,
Level INTEGER,
Seen INTEGER
What I want to do is add a record with a unique Name. If the Name already exists, I want to modify the fields.
Can somebody tell me how to do this please?
Have a look at http://sqlite.org/lang_conflict.html.
You want something like:
insert or replace into Book (ID, Name, TypeID, Level, Seen) values
((select ID from Book where Name = "SearchName"), "SearchName", ...);
Note that any field not in the insert list will be set to NULL if the row already exists in the table. This is why there's a subselect for the ID column: In the replacement case the statement would set it to NULL and then a fresh ID would be allocated.
This approach can also be used if you want to leave particular field values alone if the row in the replacement case but set the field to NULL in the insert case.
For example, assuming you want to leave Seen alone:
insert or replace into Book (ID, Name, TypeID, Level, Seen) values (
(select ID from Book where Name = "SearchName"),
"SearchName",
5,
6,
(select Seen from Book where Name = "SearchName"));
You should use the INSERT OR IGNORE command followed by an UPDATE command:
In the following example name is a primary key:
INSERT OR IGNORE INTO my_table (name, age) VALUES ('Karen', 34)
UPDATE my_table SET age = 34 WHERE name='Karen'
The first command will insert the record. If the record exists, it will ignore the error caused by the conflict with an existing primary key.
The second command will update the record (which now definitely exists)
You need to set a constraint on the table to trigger a "conflict" which you then resolve by doing a replace:
CREATE TABLE data (id INTEGER PRIMARY KEY, event_id INTEGER, track_id INTEGER, value REAL);
CREATE UNIQUE INDEX data_idx ON data(event_id, track_id);
Then you can issue:
INSERT OR REPLACE INTO data VALUES (NULL, 1, 2, 3);
INSERT OR REPLACE INTO data VALUES (NULL, 2, 2, 3);
INSERT OR REPLACE INTO data VALUES (NULL, 1, 2, 5);
The "SELECT * FROM data" will give you:
2|2|2|3.0
3|1|2|5.0
Note that the data.id is "3" and not "1" because REPLACE does a DELETE and INSERT, not an UPDATE. This also means that you must ensure that you define all necessary columns or you will get unexpected NULL values.
INSERT OR REPLACE will replace the other fields to default value.
sqlite> CREATE TABLE Book (
ID INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT,
TypeID INTEGER,
Level INTEGER,
Seen INTEGER
);
sqlite> INSERT INTO Book VALUES (1001, 'C++', 10, 10, 0);
sqlite> SELECT * FROM Book;
1001|C++|10|10|0
sqlite> INSERT OR REPLACE INTO Book(ID, Name) VALUES(1001, 'SQLite');
sqlite> SELECT * FROM Book;
1001|SQLite|||
If you want to preserve the other field
Method 1
sqlite> SELECT * FROM Book;
1001|C++|10|10|0
sqlite> INSERT OR IGNORE INTO Book(ID) VALUES(1001);
sqlite> UPDATE Book SET Name='SQLite' WHERE ID=1001;
sqlite> SELECT * FROM Book;
1001|SQLite|10|10|0
Method 2
Using UPSERT (syntax was added to SQLite with version 3.24.0 (2018-06-04))
INSERT INTO Book (ID, Name)
VALUES (1001, 'SQLite')
ON CONFLICT (ID) DO
UPDATE SET Name=excluded.Name;
The excluded. prefix equal to the value in VALUES ('SQLite').
Firstly update it. If affected row count = 0 then insert it. Its the easiest and suitable for all RDBMS.
Upsert is what you want. UPSERT syntax was added to SQLite with version 3.24.0 (2018-06-04).
CREATE TABLE phonebook2(
name TEXT PRIMARY KEY,
phonenumber TEXT,
validDate DATE
);
INSERT INTO phonebook2(name,phonenumber,validDate)
VALUES('Alice','704-555-1212','2018-05-08')
ON CONFLICT(name) DO UPDATE SET
phonenumber=excluded.phonenumber,
validDate=excluded.validDate
WHERE excluded.validDate>phonebook2.validDate;
Be warned that at this point the actual word "UPSERT" is not part of the upsert syntax.
The correct syntax is
INSERT INTO ... ON CONFLICT(...) DO UPDATE SET...
and if you are doing INSERT INTO SELECT ... your select needs at least WHERE true to solve parser ambiguity about the token ON with the join syntax.
Be warned that INSERT OR REPLACE... will delete the record before inserting a new one if it has to replace, which could be bad if you have foreign key cascades or other delete triggers.
If you have no primary key, You can insert if not exist, then do an update. The table must contain at least one entry before using this.
INSERT INTO Test
(id, name)
SELECT
101 as id,
'Bob' as name
FROM Test
WHERE NOT EXISTS(SELECT * FROM Test WHERE id = 101 and name = 'Bob') LIMIT 1;
Update Test SET id='101' WHERE name='Bob';
I believe you want UPSERT.
"INSERT OR REPLACE" without the additional trickery in that answer will reset any fields you don't specify to NULL or other default value. (This behavior of INSERT OR REPLACE is unlike UPDATE; it's exactly like INSERT, because it actually is INSERT; however if what you wanted is UPDATE-if-exists you probably want the UPDATE semantics and will be unpleasantly surprised by the actual result.)
The trickery from the suggested UPSERT implementation is basically to use INSERT OR REPLACE, but specify all fields, using embedded SELECT clauses to retrieve the current value for fields you don't want to change.
I think it's worth pointing out that there can be some unexpected behaviour here if you don't thoroughly understand how PRIMARY KEY and UNIQUE interact.
As an example, if you want to insert a record only if the NAME field isn't currently taken, and if it is, you want a constraint exception to fire to tell you, then INSERT OR REPLACE will not throw and exception and instead will resolve the UNIQUE constraint itself by replacing the conflicting record (the existing record with the same NAME). Gaspard's demonstrates this really well in his answer above.
If you want a constraint exception to fire, you have to use an INSERT statement, and rely on a separate UPDATE command to update the record once you know the name isn't taken.

Resources