I have following tables:
-- table to keep articles titles
CREATE TABLE articles(id, title);
insert into articles (id,title) values (1,'sqlite example');
insert into articles (id,title) values (2,'sqlite query ');
insert into articles (id,title) values (3,'erlang example ');
insert into articles (id,title) values (3,'erlang otp ');
-- table to keep keywords that we would like to search.
create table keywords(id,keyword);
insert into keywords (id,keyword) values (1,'sqlite');
insert into keywords (id,keyword) values (2,'otp');
-- full text search table - copy of articles.
create virtual table articles_fts using fts4 (id,name);
-- populate table with articles titles.
insert into articles_fts(id,name) select id,title from articles;
Now I would like to find ALL articles that contains any of specified keywords.
Query like:
select * from articles_fts
where name match (select keyword from keywords);
returns only titles with sqlite in it (first keyword), so the other entries of keywords table are ignored.
Question:
How could I find all articles that contains any of specified keywords?
Thanks.
Use
select *
from articles_fts
WHERE articles_fts MATCH ( select group_concat(keyword, ' OR ')
from keywords
group by 'x' )
The group_concat function aggregates all keywords with a comma. If you replace commas with OR, it should produce a good FTS query.
Also see section 3 of the full text search feature reference regarding the OR keyword and the reference for aggregate functions.
Related
Using a FTS5 virtual table returns nothing for postfix searches.
It only can search for the entire word tokens, or for the prefixes of the word tokens if I append * to the search.
For example, it does not find qwerty.png row, if I search for werty.
CREATE TABLE IF NOT EXISTS files (name TEXT, id INTEGER);
INSERT INTO files (name, id) VALUES ('qwerty.png', 1), ('asdfgh.png', 2);
CREATE VIRTUAL TABLE IF NOT EXISTS names USING FTS5(name);
INSERT INTO names (name) SELECT name FROM files;
SELECT *
FROM names
WHERE name MATCH 'werty';
It only works for prefix searches (qwerty, qwer*, qwe*, ...).
I can't use * at the start of the search (*werty), since it produces an error.
Is possibly to make the indexed text search working as if I would use
SELECT *
FROM names
WHERE name like '%wert%';
?
I just want to have the fast search for a substring without the full table scan.
Perhaps try the experimental trigram tokenizer
When using the trigram tokenizer, a query or phrase token may match any sequence of characters within a row, not just a complete token.
I have a table indexed on a text column, and I want all my queries to return results ordered by name without any performance hit.
Table has around 1 million rows if it matters.
Table -
CREATE TABLE table (Name text)
Index -
CREATE INDEX "NameIndex" ON "Files" (
"Name" COLLATE nocase ASC
);
Query 1 -
select * from table where Name like "%a%"
Query plan, as expected a full scan -
SCAN TABLE table
Time -
Result: 179202 rows returned in 53ms
Query 2, now using order by to read from index -
select * from table where Name like "%a%" order by Name collate nocase
Query plan, scan using index -
SCAN TABLE table USING INDEX NameIndex
Time -
Result: 179202 rows returned in 672ms
Used DB Browser for SQLite to get the information above, with default Pragmas.
I'd assume scanning the index would be as performant as scanning the table, is it not the case or am I doing something wrong?
Another interesting thing I noticed, that may be relevant -
Query 3 -
select * from table where Name like "a%"
Result: 23026 rows returned in 9ms
Query 4 -
select * from table where name like "a%" order by name collate nocase
Result: 23026 rows returned in 101ms
And both has them same query plan -
SEARCH TABLE table USING INDEX NameIndex (Name>? AND Name<?)
Is this expected? I'd assume the performance be the same if the plan was the same.
Thanks!
EDIT - The reason the query is slower was because I used select * and not select name, causing SQLite to go between the table and the index.
The solution was to use clustered index, thanks #Tomalak for helping me find it -
create table mytable (a text, b text, primary key (a,b)) without rowid
The table will be ordered by default using a + b combination, meaning that full scan queries will be much faster (now 90ms).
A LIKE pattern that starts with % can never use an index. It will always result in a full table scan (or index scan, if the query can be covered by the index itself).
It's logical when you think about it. Indexes are not magic. They are sorted lists of values, exactly like a keyword index in a book, and that means they are only only quick for looking up a word if you know how the given word starts. If you're searching for the middle part of a word, you would have to look at every index entry in a book as well.
Conclusion from the ensuing discussion in the comments:
The best course of action to get a table that always sorts by a non-unique column without a performance penalty is to create it without ROWID, and turn it into a clustering index over a the column in question plus a second column that makes the combination unique:
CREATE TABLE MyTable (
Name TEXT COLLATE NOCASE,
Id INTEGER,
Other TEXT,
Stuff INTEGER,
PRIMARY KEY(Name, Id) -- this will sort the whole table by Name
) WITHOUT ROWID;
This will result in a performance penalty for INSERT/UPDATE/DELETE operations, but in exchange sorting will be free since the table is already ordered.
Problem description
I want to search for the query = Angela in a database from a table called Variations. The problem is that the database does not Angela. It contains Angel. As you can see the a is missing.
Searching procedure
The table that I want to query is the following:
"CREATE TABLE IF NOT EXISTS VARIATIONS
(ID INTEGER PRIMARY KEY NOT NULL,
ID_ENTITE INTEGER,
NAME TEXT,
TYPE TEXT,
LANGUAGE TEXT);"
To search for the query I am using fts4 because it is faster than LIKE% especially if I have a big database with more than 10 millions rows. I cannot also use the equality since i am looking for substrings.
I create a virtual table create virtual table variation_virtual using fts4(ID, ID_ENTITE, NAME, TYPE, LANGUAGE);
Filled the virtual table with VARIATIONS insert into variation_virtual select * from VARIATIONS;
The selection query is represented as follow:
SELECT ID_ENTITE, NAME FROM variation_virtual WHERE NAME MATCH "Angela";
Question
What am I missing in the query. What I am doing is the opposite of when we want to check if a query is a subtring of a string in a table.
You can't use fts4 for this. From the documentation:
SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux'; /* 0.03 seconds */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds */
Of course, the two queries above are not
entirely equivalent. For example the LIKE query matches rows that
contain terms such as "linuxophobe" or "EnterpriseLinux" (as it
happens, the Enron E-Mail Dataset does not actually contain any such
terms), whereas the MATCH query on the FTS3 table selects only those
rows that contain "linux" as a discrete token. Both searches are
case-insensitive.
So your query will only match strings that have 'Angela' as a word (at least that is how I interpret 'discrete token').
I am trying to implement an idea where I have two sql tables in a database.
Table Info which has a field Nationality and the other Table Exclusion which has a field Keyword.
`Info.Nationality` `Exclusion.Keyword`
|British| |France|
|resteraunt de France| |Spanish|
|German|
|Flag Italian|
|Spanish rice|
|Italian pasta
|Irish beef|
In my web application I am creating a GridView4 and through a DataTable and SqlDataAdapter I am populating that GridView4 with the SQL command:
SELECT DISTINCT Info.Nationality WHERE Exclusion.Keyword NOT LIKE '%Spanish%'
That SQL statement retrieves all the distinct records in Info.Nationality which do not contain the word spanish.
What I am currently doing is that in the web app which is in vb.net I am adding two different GridViews, each have the data of each table which means that GridView2 has DISTINCT Info.Nationality and GridView3 has Exclusion.Keyword and then adding another GridView4 to display the results of the above SQL command.
The idea is to retrieve all the distinct records from Info.Nationlity which are not suppressed by the keyword constraints in Exclusion.keyword. So from the above mentioned Sql command the GridView4 will retrieve all the records which do not have the word "Spanish".
I am doing all of this in a nested for loop where in the first loop it takes each record (one by one) from Info.Nationality e.g.for each row As DataRow in Me.GridView2.Rows() and compare it with the second for loop which goes till the end of the Exclusion.Keyword which would be like For i=0 To Gridview3 - 1.
The problem is that in that sql statement I have to explicitly specify the word to compare. I tried adding the records of Exclusion.Keyword in a String and then replacing the Spanish Keyword In between the NOT LIKE with the name of the String which is Keywords and then assigning the name a parameter with cmd.parameter.addwithvalue(#String, Keywords). However this is not working, it is only comparing with the last record in the string and ignoring the first.
The idea behind all of this is to display all the records of Info.Nationality in GridView4 which do not contain the keywords in Exclusion.Keyword.
Is there an easier or more effecient way to do this? I was thinking of an Inner Join with a Like command but that is not my problem. My problem is that how can I compare each record one by one of Info.Nationlity with all the records in Exclusion.keyword and then retrieving the ones that do not match and discarding the ones that match.
Then in Gridview4 how can I edit the records without reflecting those changes or affecting in Info.Nationality but rather only Inserting to Exclusion.Keyword the changes.
SOLVED by adding ToString() after Text
In my asp.net web app, I tried this, but didn't work: (SOLVED)
`SELECT DISTINCT Nationality
FROM Info Where NOT EXISTS(SELECT * FROM Exclusion WHERE Info.Nationality LIKE '%' + #GridView +'%')`
`cmd.parameters.AddwithValue("#GridView", GridView3.Rows(i).Cells(0).Text.ToString())`
GridView3 Here has the Exclusion.Keywords data.
Would really appreciate your suggestions and thoughts around this.
You do not need to do this one-by-one, or "Row by agonizing row" as some DBAs are fond of describing this type of approach. There are lots of ways to write a query to only return the records from Info.nationality that do not match the exclusion keywords as a single expression.
My preference is to use the EXISTS clause and a correlated subquery:
SELECT Nationality
FROM Info I
WHERE NOT EXISTS(SELECT * FROM Exclusion WHERE I.Nationality LIKE '%' + Keyword + '%')
You can also express this as a left join.
SELECT I.Nationality
FROM Info I
LEFT OUTER JOIN Exclusion E
ON I.Nationality LIKE '%' + E.Keyword + '%'
WHERE E.Keyword IS NULL
The left join will return all the rows from info and insert nulls in the columns for Exclusion except where the join criteria matches. By filtering for only where those values are null, you can avoid the matches.
I currently have a diagnosis table. I want to make the code and description fields searchable using FTS. As I understand it though, FTS tables don't support indexes and I need to be able to lookup Diagnosis by diagnosisID very quickly. Am I going to have to create a second virtual table with all of the data duplicated just for full text searching or am I missing a solution where I dont have to duplicate all of my diagnosis codes and descriptions?
CREATE TABLE Diagnosis (
diagnosisID INTEGER PRIMARY KEY NOT NULL,
code TEXT,
collect INTEGER NOT NULL,
description TEXT
);
Turns out an FTS table has a hidden rowid field, which you can populate when you are entering data:
sqlite> create virtual table test1 using fts3;
sqlite> insert into test1 values ("This is a document!");
sqlite> insert into test1(docid,content) values (5,"this is another document");
sqlite> select rowid,* from test1;
1|This is a document!
5|this is another document
You could create an integer field in your standard table that refers to the FTS table by rowid, and move the columns you wish to make text-searchable into the FTS table.
All the info you need here :)