Indexes with custom collations in sqlite - sqlite

Assuming I have a schema like this:
CREATE TABLE abc(
id INTEGER PRIMARY KEY AUTOINCREMENT,
txt TEXT
);
CREATE INDEX "txtCS" ON "abc"("txt" COLLATE MY_CUSTOM_SORT);
when will sqlite use my index on txt ?
because I ran:
EXPLAIN QUERY PLAN SELECT * FROM abc ORDER BY txt COLLATE MY_CUSTOM_SORT DESC ...
and it tells me that it scans the table, twice, using the txtCS index (It doesn't search like I expected.)
MY_CUSTOM_SORT is my own sorting function that I hooked with sqliteCreateCollation. I just need that index for some queries that involve special ordering and I want them to be fast

In the EXPLAIN QUERY PLAN output, SEARCH means that the database tries to look up some particular record(s) with specific values, while SCAN means that the database goes through the entire table.
This query returns all records, so the most efficient operation is a SCAN.
Either operation can be sped up with an index.
(In a SCAN, the database just goes through all index entries in order.)

Related

SQLite queryslow when using index

I have a table indexed on a text column, and I want all my queries to return results ordered by name without any performance hit.
Table has around 1 million rows if it matters.
Table -
CREATE TABLE table (Name text)
Index -
CREATE INDEX "NameIndex" ON "Files" (
"Name" COLLATE nocase ASC
);
Query 1 -
select * from table where Name like "%a%"
Query plan, as expected a full scan -
SCAN TABLE table
Time -
Result: 179202 rows returned in 53ms
Query 2, now using order by to read from index -
select * from table where Name like "%a%" order by Name collate nocase
Query plan, scan using index -
SCAN TABLE table USING INDEX NameIndex
Time -
Result: 179202 rows returned in 672ms
Used DB Browser for SQLite to get the information above, with default Pragmas.
I'd assume scanning the index would be as performant as scanning the table, is it not the case or am I doing something wrong?
Another interesting thing I noticed, that may be relevant -
Query 3 -
select * from table where Name like "a%"
Result: 23026 rows returned in 9ms
Query 4 -
select * from table where name like "a%" order by name collate nocase
Result: 23026 rows returned in 101ms
And both has them same query plan -
SEARCH TABLE table USING INDEX NameIndex (Name>? AND Name<?)
Is this expected? I'd assume the performance be the same if the plan was the same.
Thanks!
EDIT - The reason the query is slower was because I used select * and not select name, causing SQLite to go between the table and the index.
The solution was to use clustered index, thanks #Tomalak for helping me find it -
create table mytable (a text, b text, primary key (a,b)) without rowid
The table will be ordered by default using a + b combination, meaning that full scan queries will be much faster (now 90ms).
A LIKE pattern that starts with % can never use an index. It will always result in a full table scan (or index scan, if the query can be covered by the index itself).
It's logical when you think about it. Indexes are not magic. They are sorted lists of values, exactly like a keyword index in a book, and that means they are only only quick for looking up a word if you know how the given word starts. If you're searching for the middle part of a word, you would have to look at every index entry in a book as well.
Conclusion from the ensuing discussion in the comments:
The best course of action to get a table that always sorts by a non-unique column without a performance penalty is to create it without ROWID, and turn it into a clustering index over a the column in question plus a second column that makes the combination unique:
CREATE TABLE MyTable (
Name TEXT COLLATE NOCASE,
Id INTEGER,
Other TEXT,
Stuff INTEGER,
PRIMARY KEY(Name, Id) -- this will sort the whole table by Name
) WITHOUT ROWID;
This will result in a performance penalty for INSERT/UPDATE/DELETE operations, but in exchange sorting will be free since the table is already ordered.

How to search the database for a field which is a substring of the query by using sqlite

Problem description
I want to search for the query = Angela in a database from a table called Variations. The problem is that the database does not Angela. It contains Angel. As you can see the a is missing.
Searching procedure
The table that I want to query is the following:
"CREATE TABLE IF NOT EXISTS VARIATIONS
(ID INTEGER PRIMARY KEY NOT NULL,
ID_ENTITE INTEGER,
NAME TEXT,
TYPE TEXT,
LANGUAGE TEXT);"
To search for the query I am using fts4 because it is faster than LIKE% especially if I have a big database with more than 10 millions rows. I cannot also use the equality since i am looking for substrings.
I create a virtual table create virtual table variation_virtual using fts4(ID, ID_ENTITE, NAME, TYPE, LANGUAGE);
Filled the virtual table with VARIATIONS insert into variation_virtual select * from VARIATIONS;
The selection query is represented as follow:
SELECT ID_ENTITE, NAME FROM variation_virtual WHERE NAME MATCH "Angela";
Question
What am I missing in the query. What I am doing is the opposite of when we want to check if a query is a subtring of a string in a table.
You can't use fts4 for this. From the documentation:
SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux'; /* 0.03 seconds */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds */
Of course, the two queries above are not
entirely equivalent. For example the LIKE query matches rows that
contain terms such as "linuxophobe" or "EnterpriseLinux" (as it
happens, the Enron E-Mail Dataset does not actually contain any such
terms), whereas the MATCH query on the FTS3 table selects only those
rows that contain "linux" as a discrete token. Both searches are
case-insensitive.
So your query will only match strings that have 'Angela' as a word (at least that is how I interpret 'discrete token').

Fast search on a blob starting bytes in SQLite

Is there a way to index blob fields and have the index used for beginning of blob searches?
Currently I have hashes stored as hexadecimal in text fields.
These hashes in hexadecimal form are 32 characters long, and form the bulk of the data in the database.
Problem is, they are often searched by their starting bytes, as in
select * from mytable where hash like '00a1b2%'
I would like to store them as blobs, as this saves about 30% of the database size. However while
select * from mytable where hex(hash) like '00a1b2%'
works, it's also much slower and does not seem to use the index.
Searching for exact blob matches does use the index, so the index is working.
Is there a way to perform a search on a blob start (with binary/memcmp "collation") that would use the index?
I also tried substr(), it's apparently faster than hex() but still not indexed
select * from mytable where substr(hash, 1, 6) = x'00a1b2'
To be able to use an index for LIKE, the table column must have TEXT affinity, and the index must be case insensitive:
CREATE TABLE mytable(... hash TEXT, ...);
CREATE INDEX hash_index ON mytable(hash COLLATE NOCASE);
Functions like hex or substr prevent usage of indexes.
Blobs can be indexed and compared like other types.
This allows you to express a prefix search with two comparisons:
SELECT * FROM mytable WHERE hash >= x'00a1b2' AND hash < x'00a1b3'

SQLite: Ordering my select results

I have a table with unique usernames and a bunch of string data I am keeping track of. Each user will have 1000 rows and when I select them I want to return them in the order they were added. Is the following code a necessary and correct way of doing this:
CREATE TABLE foo (
username TEXT PRIMARY KEY,
col1 TEXT,
col2 TEXT,
...
order_id INTEGER NOT NULL
);
CREATE INDEX foo_order_index ON foo(order_id);
SELECT * FROM foo where username = 'bar' ORDER BY order_id;
Add a DateAdded field and default it to the date/time the row was added and sort on that.
If you absolutely must use the order_ID, which I don't suggest. Then at least make it an identity column. The reason I advise against this is because you are relying on side affects to do your sorting and it will make your code harder to read.
If each user will have 1000 rows, then username should not be the primary key. One option is to use the int identity column which all tables have (which optimizes I/O reads since it's typically stored in that order).
Read under "RowIds and the Integer Primary Key" # http://www.sqlite.org/lang_createtable.html
The data for each table in SQLite is stored as a B-Tree structure
containing an entry for each table row, using the rowid value as the
key. This means that retrieving or sorting records by rowid is fast.
Because it's stored in that order in the B-tree structure, it should be fast to order by the int primary key. Make sure it's an alias for rowid though - more in that article.
Also, if you're going to be doing queries where username = 'bob', you should consider an index on the username column - especially there's going to be many users which makes the index effective because of high selectivity. In contrast, adding an index on a column with values like 1 and 0 only leads to low selectivity and renders the index very ineffective. So, if you have 3 users :) it's not worth it.
You can remove the order_id column & index entirely (unless you need them for something other than this sorting).
SQLite tables always have a integer primary key - in this case, your username column has silently been made a unique key, so the table only has the one integer primary key. The key column is called rowid. For your sorting purpose, you'll want to explicitly make it AUTOINCREMENT so that every row always has a higher rowid than older rows.
You probably want to read http://www.sqlite.org/autoinc.html
CREATE TABLE foo (
rowid INTEGER PRIMARY KEY AUTOINCREMENT,
username TEXT UNIQUE KEY,
...
Then your select becomes
select * from foo order by rowed;
One advantage of this approach is that you're re-using the index SQLite will already be placing on your table. A date or order_id column is going to mean an extra index, which is just overhead here.

SQLITE: Unable to remove an unnamed primary key

I have a sqlite table that was originally created with:
PRIMARY KEY (`column`);
I now need to remove that primary key and create a new one. Creating a new one is easy, but removing the original seems to be the hard part. If I do
.indices tablename
I don't get the primary key. Some programs show the primary key as
Indexes: 1
[] PRIMARY
The index name is typically in the [].
Any ideas?
You can't.
PRAGMA INDEX_LIST('MyTable');
will give you a list of indices. This will include the automatically generated index for the primary key which will be called something like 'sqlite_autoindex_MyTable_1'.
But unfortunately you cannot drop this index...
sqlite> drop index sqlite_autoindex_MyTable_1;
SQL error: index associated with UNIQUE or PRIMARY KEY constraint cannot be dropped
All you can do is re-create the table without the primary key.
I the database glossary; a primary-key is a type of index where the index order is typically results in the physical ordering of the raw database records. That said any database engine that allows the primary key to be changed is likely reordering the database... so most do not and the operation is up to the programmer to create a script to rename the table and create a new one. So if you want to change the PK there is no magic SQL.
select * from sqlite_master;
table|x|x|2|CREATE TABLE x (a text, b text, primary key (`a`))
index|sqlite_autoindex_x_1|x|3|
You'll see that the second row returned from my quick hack has the index name in the second column, and the table name in the third. Try seeing if that name is anything useful.

Resources