How to search the database for a field which is a substring of the query by using sqlite - sqlite

Problem description
I want to search for the query = Angela in a database from a table called Variations. The problem is that the database does not Angela. It contains Angel. As you can see the a is missing.
Searching procedure
The table that I want to query is the following:
"CREATE TABLE IF NOT EXISTS VARIATIONS
(ID INTEGER PRIMARY KEY NOT NULL,
ID_ENTITE INTEGER,
NAME TEXT,
TYPE TEXT,
LANGUAGE TEXT);"
To search for the query I am using fts4 because it is faster than LIKE% especially if I have a big database with more than 10 millions rows. I cannot also use the equality since i am looking for substrings.
I create a virtual table create virtual table variation_virtual using fts4(ID, ID_ENTITE, NAME, TYPE, LANGUAGE);
Filled the virtual table with VARIATIONS insert into variation_virtual select * from VARIATIONS;
The selection query is represented as follow:
SELECT ID_ENTITE, NAME FROM variation_virtual WHERE NAME MATCH "Angela";
Question
What am I missing in the query. What I am doing is the opposite of when we want to check if a query is a subtring of a string in a table.

You can't use fts4 for this. From the documentation:
SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux'; /* 0.03 seconds */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds */
Of course, the two queries above are not
entirely equivalent. For example the LIKE query matches rows that
contain terms such as "linuxophobe" or "EnterpriseLinux" (as it
happens, the Enron E-Mail Dataset does not actually contain any such
terms), whereas the MATCH query on the FTS3 table selects only those
rows that contain "linux" as a discrete token. Both searches are
case-insensitive.
So your query will only match strings that have 'Angela' as a word (at least that is how I interpret 'discrete token').

Related

Lossless SQLite FTS5 search of a substring

Using a FTS5 virtual table returns nothing for postfix searches.
It only can search for the entire word tokens, or for the prefixes of the word tokens if I append * to the search.
For example, it does not find qwerty.png row, if I search for werty.
CREATE TABLE IF NOT EXISTS files (name TEXT, id INTEGER);
INSERT INTO files (name, id) VALUES ('qwerty.png', 1), ('asdfgh.png', 2);
CREATE VIRTUAL TABLE IF NOT EXISTS names USING FTS5(name);
INSERT INTO names (name) SELECT name FROM files;
SELECT *
FROM names
WHERE name MATCH 'werty';
It only works for prefix searches (qwerty, qwer*, qwe*, ...).
I can't use * at the start of the search (*werty), since it produces an error.
Is possibly to make the indexed text search working as if I would use
SELECT *
FROM names
WHERE name like '%wert%';
?
I just want to have the fast search for a substring without the full table scan.
Perhaps try the experimental trigram tokenizer
When using the trigram tokenizer, a query or phrase token may match any sequence of characters within a row, not just a complete token.

SQLite queryslow when using index

I have a table indexed on a text column, and I want all my queries to return results ordered by name without any performance hit.
Table has around 1 million rows if it matters.
Table -
CREATE TABLE table (Name text)
Index -
CREATE INDEX "NameIndex" ON "Files" (
"Name" COLLATE nocase ASC
);
Query 1 -
select * from table where Name like "%a%"
Query plan, as expected a full scan -
SCAN TABLE table
Time -
Result: 179202 rows returned in 53ms
Query 2, now using order by to read from index -
select * from table where Name like "%a%" order by Name collate nocase
Query plan, scan using index -
SCAN TABLE table USING INDEX NameIndex
Time -
Result: 179202 rows returned in 672ms
Used DB Browser for SQLite to get the information above, with default Pragmas.
I'd assume scanning the index would be as performant as scanning the table, is it not the case or am I doing something wrong?
Another interesting thing I noticed, that may be relevant -
Query 3 -
select * from table where Name like "a%"
Result: 23026 rows returned in 9ms
Query 4 -
select * from table where name like "a%" order by name collate nocase
Result: 23026 rows returned in 101ms
And both has them same query plan -
SEARCH TABLE table USING INDEX NameIndex (Name>? AND Name<?)
Is this expected? I'd assume the performance be the same if the plan was the same.
Thanks!
EDIT - The reason the query is slower was because I used select * and not select name, causing SQLite to go between the table and the index.
The solution was to use clustered index, thanks #Tomalak for helping me find it -
create table mytable (a text, b text, primary key (a,b)) without rowid
The table will be ordered by default using a + b combination, meaning that full scan queries will be much faster (now 90ms).
A LIKE pattern that starts with % can never use an index. It will always result in a full table scan (or index scan, if the query can be covered by the index itself).
It's logical when you think about it. Indexes are not magic. They are sorted lists of values, exactly like a keyword index in a book, and that means they are only only quick for looking up a word if you know how the given word starts. If you're searching for the middle part of a word, you would have to look at every index entry in a book as well.
Conclusion from the ensuing discussion in the comments:
The best course of action to get a table that always sorts by a non-unique column without a performance penalty is to create it without ROWID, and turn it into a clustering index over a the column in question plus a second column that makes the combination unique:
CREATE TABLE MyTable (
Name TEXT COLLATE NOCASE,
Id INTEGER,
Other TEXT,
Stuff INTEGER,
PRIMARY KEY(Name, Id) -- this will sort the whole table by Name
) WITHOUT ROWID;
This will result in a performance penalty for INSERT/UPDATE/DELETE operations, but in exchange sorting will be free since the table is already ordered.

SQLite: insert as Integer and as Text

Given we have a simple table like
CREATE TABLE A(
amount INTEGER
);
What is the difference between queries
INSERT INTO A VALUES(4);
and
INSERT INTO A VALUES('12');
As seen in schema, amount is an INTEGER column. The first query operates with just that - an integer, but the second one operates with a string '12'. Yet both queries work just fine, the table gets values 4 and 12, and can select or, say, sum them up correctly as two valid Integers:
SELECT sum(amount) AS "Total" FROM A;
correctly yields 16.
So is there a difference between inserting an integer as (4) and inserting it as ('12') into the INTEGER-type column?
SQLite tries to convert your String into an Integer before inserting the value into your table as described in the manual.
The type affinity of a column is the recommended type for data stored in that column. The important idea here is that the type is recommended, not required. Any column can still store any type of data. It is just that some columns, given the choice, will prefer to use one storage class over another.

Indexes with custom collations in sqlite

Assuming I have a schema like this:
CREATE TABLE abc(
id INTEGER PRIMARY KEY AUTOINCREMENT,
txt TEXT
);
CREATE INDEX "txtCS" ON "abc"("txt" COLLATE MY_CUSTOM_SORT);
when will sqlite use my index on txt ?
because I ran:
EXPLAIN QUERY PLAN SELECT * FROM abc ORDER BY txt COLLATE MY_CUSTOM_SORT DESC ...
and it tells me that it scans the table, twice, using the txtCS index (It doesn't search like I expected.)
MY_CUSTOM_SORT is my own sorting function that I hooked with sqliteCreateCollation. I just need that index for some queries that involve special ordering and I want them to be fast
In the EXPLAIN QUERY PLAN output, SEARCH means that the database tries to look up some particular record(s) with specific values, while SCAN means that the database goes through the entire table.
This query returns all records, so the most efficient operation is a SCAN.
Either operation can be sped up with an index.
(In a SCAN, the database just goes through all index entries in order.)

Efficiently match the first characters of an indexed SQLite field

I have two tables in SQLite:
CREATE TABLE article (body TEXT)
CREATE TABLE article_word (
word TEXT,
article_rowid INTEGER,
FOREIGN KEY(article_rowid) REFERENCES article(rowid),
PRIMARY KEY(word, article_rowid)
)
The program stores long strings in article.body and, for each word in the string, it stores a lowercase version of the word along with the article's rowid in article_word.
I want to let the user search for articles by the first case-insensitive characters in a word, so a search for baz yields an article containing foobar Bazquux spambacon.
How can I modify the tables/add more (if necessary) and query them for matches optimally? Does
SELECT a.rowid, a.body FROM article a, article_word w WHERE w.word LIKE "baz%" AND a.rowid = w.article_rowid
take advantage of the PRIMARY KEY index on article_word.word or does it naïvely search every row?
Use NSPredicate to retrive specific Attribute According to your
requirement, and you can also do Mapping with Sqlite as in core Data.

Resources