sqlite3 fts3 multiple columns search including special characters - sqlite

i am using sqlite3 fts3. (sqlite3 version is 3.7.17)
I tried to search keywords including special characters (ex. #, ?) in multiple columns.
This is my examples.
SELECT * FROM table_name WHERE table_name MATCH
'EMAIL:aaa#test.com OR SUBJECT:is it a question?'
This query have to return a result having email address is 'aaa#test.com' or subject is 'is it a question?'
But this query is not return correct results.
I think that sqlite3 fts3 can't recognize special characters...
How can i solve this problem? :(

To do a phrase query, you must use quotes.
Special characters are filtered out by the default tokenizer; aaa#test.com must be handled as a phrase with three words.

Related

How to create virtual table FTS with external sqlite content table?

I want to create a SQLite virtual table with a content of a real one.
I have a small sample which demonstrates my problem. I already red the official tutorial, but can't find anything wrong in this code. Some users use a rebuild option, but it doesn't work for me.
CREATE TABLE if NOT EXISTS posts (a INTEGER PRIMARY KEY);
INSERT OR IGNORE INTO posts (a) VALUES(510000);
INSERT OR IGNORE INTO posts (a) VALUES(510001);
INSERT OR IGNORE INTO posts (a) VALUES(510300);
CREATE VIRTUAL TABLE IF NOT EXISTS posts_fts using fts5(content=posts, content_rowid=a, a);
SELECT * FROM posts_fts where posts_fts MATCH '10' ORDER BY a ASC;
If I run this, I get:
0 rows returned in 2ms from: SELECT * FROM posts_fts where posts_fts match '10' ORDER BY a ASC;
Does anyone have an idea wat I do wrong?
"10" is not a token in the FTS table.
From the doc:
4.3.1. Unicode61 Tokenizer
The unicode tokenizer classifies all unicode characters as either
"separator" or "token" characters. By default all space and
punctuation characters, as defined by Unicode 6.1, are considered
separators, and all other characters as token characters. More
specifically, all unicode characters assigned to a general category
beginning with "L" or "N" (letters and numbers, specifically) or to
category "Co" ("other, private use") are considered tokens. All other
characters are separators.
Each contiguous run of one or more token characters is considered to
be a token. The tokenizer is case-insensitive according to the rules
defined by Unicode 6.1.
Also from the doc:
3.2. FTS5 Phrases
FTS queries are made up of phrases. A phrase is an ordered list of one
or more tokens.
You might try a "prefix query" i.e. MATCH "5*" to see that you get results.

How to perform SQLite LIKE queries with wildcards not read as wildcards

I'm running into a problem in SQLite when querying on text fields that happen to have the _ or % wildcard characters.
I have a table with a 'Name' field I want to query on. Two of my records have the value 'test' and 'te_t' in the 'Name' field I want to query on. If I run a query like below
"SELECT ALL * from Table WHERE Name LIKE 'te_t'"
This will return both the 'te_t' and 'test' records, because of '_' being read as a wildcard. How do I make it so that I only get the 'te_t' record from the above query?
I've done some research on this and read that I should be able to throw a backslash '\' character in front of the wildcard to get it to be read as a normal _ character instead of a wildcard. But when I try the query
"SELECT ALL * from Table WHERE Name LIKE 'te\_t'"
my query returns zero matches.
What am I doing wrong? Is this just not possible in SQLite?
In SQL, you can escape special characters in the LIKE pattern if you declare some escape character with ESCAPE:
SELECT * FROM MyTable WHERE Name LIKE 'te\_t' ESCAPE '\'
(see the documentation)

Why does SQLite full-text search (FTS4) treat angle brackets differently in a compound search?

I have an SQLite database using FTS4. It is used to store emails with message id's of the form:
Searching for messages using the FTS MATCH syntax, I get a result from:
SELECT rowid FROM emails WHERE emails MATCH '<8200#comms.io>'
This returns the correct row. But when I try to find multiple emails, I get an empty response:
SELECT rowid FROM emails WHERE emails MATCH '<8200#comms.io> OR <8188#comms.io>'
Strangely though, I can search without the angle bracket characters. This returns both rows:
SELECT rowid FROM emails WHERE emails MATCH '8200#comms.io OR 8188#comms.io'
This even though the angle brackets are present in the stored columns. I can find no mention that these are special characters in SQLite, and without the 'OR', the single-term search works fine.
Why are these characters treated differently in my compound search?
The default (simple) tokenizer reads alphanumerical characters and treats all others as word separators to be ignored.
So when searching for a message ID, you have to actually search for a phrase with multiple words (8200, comms, and io).
If you want to treat the entire message ID as a word, you have to write a custom tokenizer.

SQL Server & ASP .NET encoding issue

my page has utf-8 meta element added + sql server encoding is also utf. However when I create record and try to issue SELECT statement with condition that contains POLISH characters like 'ń' , I see no results. Any ideas what am I missing?
ALSO Sql management studio shows result with POLISH characters , but I don't trust it.... I guess something is wrong with putting record into database...
Or how can I troubleshoot it?
Thanks,Paweł
I had the same issue, and I solved it by prefixing the text in the WHERE clause with "N".
For example, I have a table 'Person' containing a bit over 21,000 names of people. A person with the last name "Krzemiński" was recently added to the database, and the name appears normal when the row is displayed (i.e., the "ń" character is displayed correctly). However, neither of the following statements returned any records:
SELECT * FROM Person WHERE FamilyName='Krzemiński
SELECT * FROM Person WHERE FamilyName LIKE 'Krzemiń%'
...but these statements both returned the correct record:
SELECT * FROM Person WHERE FamilyName LIKE 'Krzemi%'<br>
SELECT * FROM Person WHERE FamilyName LIKE 'Krzemi%ski'
When I executed the following statement:
SELECT * FROM Person WHERE FamilyName LIKE '%ń%'
I get all 8900 records that contain the letter "n" (no diacritic), but I do not get the record that contains the "ń" character. I tried this last query with all of the Polish characters (ąćęłńóśźż), and all of them except "ó" exhibit the same behavior (i.e., return all records with the lower-ASCII equivalent character). Weirdly, "ó" works as it should, returning only those records with an "ó" in the FamilyName field.
In any case, the solution was to prefix the search criterion with "N", to explicitly declare it as Unicode.
Thus, the following statements:
SELECT * FROM Person WHERE FamilyName LIKE N'%ń%'
SELECT * FROM Person WHERE FamilyName=N'Krzemiński'
...both return the correct set of records.
The reason I was confused is that I have MANY records with weird diacritics, and they all return the correct records even without the "N" prefix. So far, the only characters I've found that require the explicit "N" prefix are the Polish characters.
According to this (Archived) Microsoft Support Issue:
You must precede all Unicode strings with a prefix N when you deal with Unicode string constants in SQL Server
simply use nvarchar instead of varchar as the datatype of the column saving the record.

NHibernate, SQLite, and Cyrillic characters: case sensitivity and fallback queries

I'm querying an SQLite database using NHibernate. Generally, I want to do case insensitive string queries. Recently, I've discovered that although I can insert a row with Cyrillic characters, I can not select it using a case insensitive query. This is what the query looks like:
string foo = "foo";
IList<T> list = session.CreateCriteria(typeof(T)).
Add(Expression.Eq("Foo", foo).IgnoreCase()).List<T>();
I can, however, select the row using the above query if IgnoreCase() is removed. A naive fix would be to check if list.Count == 0 after the first query, and make a subsequent case sensitive query. The major downside of this approach is that querying for non-existent rows is a reasonably common operation that would now consist of two queries.
The question is, how can I construct a single query that will select from the Foo column that is case insensitive yet will also select rows that contain Cyrillic characters?
Case insensitive queries by default only work with ASCII characters in SQLite.
See this FAQ: Case-insensitive matching of Unicode characters does not work.

Resources