SQLite Virtual Table Match Escape character - sqlite

I'm working on an applications where the indices are stored in a SQLite FTS3 virtual table. We are implementing full text matches which means we send through queries like:
select * from blah where term match '<insert term here>'
That's all well and good until the term we want to match contains a hyphen in case the SQLite virtual match syntax interprets bacon-and-eggs as bacon, not and, not eggs.
Does anyone know of an escape character to make the fts table ignore the hyphen? I tried adding an ESCAPE '\' clause and using \ before each hyphen but the match statement rejects that syntax.
Thanks.

There are lots of strings that FTS considers "special" and that needs to be escaped. The easiest way to do that is to add DOUBLE quotes around the string you want to search for.
Example 1: Say the term you want to search for is bacon-and-eggs.
select * from blah where term match '"bacon-and-eggs"'
This also treats the whole string as a phrase, so hits with the same words in a different order doesn't generate any hits. To get around that you can quote each word separately.
Example 2: Say the term you want to search for is bacon and eggs.
select * from blah where term match '"bacon" "and" "eggs"'
Hope this helps someone!

This question is older and involves fts3, but I thought I would add an update to show how you can do this using the newer fts5.
Let's start by setting up a test environment on the command line:
$ sqlite3 ":memory:"
Then creating an fts5 table that can handle the dash:
sqlite> CREATE VIRTUAL TABLE IF NOT EXISTS blah USING fts5(term, tokenize="unicode61 tokenchars '-'");
Notice the subtle use of double and single quotes in the tokenize value.
With setup out of the way, let's add some values to search for:
sqlite> INSERT INTO blah (term) VALUES ('bacon-and-eggs');
sqlite> INSERT INTO blah (term) VALUES ('bacon');
sqlite> INSERT INTO blah (term) VALUES ('eggs');
Then let's actually search for them:
sqlite> SELECT * from blah WHERE term MATCH '"bacon-and-eggs"';
bacon-and-eggs
sqlite> SELECT * from blah WHERE term MATCH '"bacon"*';
bacon-and-eggs
bacon
Once again, notice the subtle use of double and single quotes for the search term.

FTS ignores all non-alphanumeric characters in the index. Before sending the search term to FTS you can convert it to
bacon NEAR/0 AND NEAR/0 eggs
to search for adjacent words.

Related

Mysql table collation change

I have an old table with several Spanish keywords. Its collation is latin1_swedish_ci.
The column with the keywords has a Primary index.
When I try to change collation to utf8_general_ci it is not possible because if finds duplicates.
With that index it is not possible.
What happens is that, for example, "cañada" is taken as "canada" that already exists but they are different words.
That was using phpMyAdmin.
Another try was to export the table as file.sql and using
sed 's/STRING_SOURCE/STRING_REPLACE/'
but at the end mysql source gave me the same error (did expect that :))
I also try that last one with the entire database.
MySQL version 5.5.64-MariaDB
phpMyAdmin, selected the database/table, tab Structure, column with the keywords selected Change and finally from the drop down Collation I selected ut8_general_ci
How can I make this change keeping all the keywords?
Since you are focused on Spanish, use a Spanish collation, not a generic one: utf8_spanish_ci and utf8_spanish2_ci. They treat ñ as a separate letter between n and o. Other collations treat ñ and n as the same.
Meanwhile, ç=c.
However ll is treated as two l by utf8_spanish_ci, while it is treated as coming after lz by utf8_spanish2_ci. (Something about dictionary versus phonebook -- remember those artifacts from ancient history?)
Ref: http://mysql.rjweb.org/utf8_collations.html
Once you upgrade to 8.0, there will be two more choices: utf8mb4_es_0900_ai_ci and utf8mb4_es_trad_0900_ai_ci.
Ref: http://mysql.rjweb.org/utf8mb4_collations.html

How to query Unicode characters from SQL Server 2008

With NVARCHAR data type, I store my local language text in a column. I face a problem how to query that value from the database.
ዜናገብርኤልስ is stored value.
I wrote SQL like this
select DivisionName
from t_Et_Divisions
where DivisionName = 'ዜናገብርኤልስ'
select unicode (DivisionName)
from t_Et_Divisions
where DivisionName = 'ዜናገብርኤልስ'
The above didn't work. Does anyone have any ideas how to fix it?
Thanks!
You need to prefix your Unicode string literals with a N:
select DivisionName
from t_Et_Divisions
where DivisionName = N'ዜናገብርኤልስ'
This N prefix tells SQL Server to treat this string literal as a Unicode string and not convert it to a non-Unicode string (as it will if you omit the N prefix).
Update:
I still fail to understand what is not working according to you....
I tried setting up a table with an NVARCHAR column, and if I select, I get back that one, exact row match - as expected:
DECLARE #test TABLE (DivisionName NVARCHAR(100))
INSERT INTO #test (DivisionName)
VALUES (N'ዜናገብርኤልስ'), (N'ዜናገብርኤልስ,ኔትዎርክ,ከስተመር ስርቪስ'), (N'ኔትዎርክ,ከስተመር ስርቪስ')
SELECT *
FROM #test
WHERE DivisionName = N'ዜናገብርኤልስ'
This returns exactly one row - what else are you seeing, or what else are you expecting??
Update #2:
Ah - I see - the columns contains multiple, comma-separated values - which is a horrible design mistake to begin with..... (violates first normal form of database design - don't do it!!)
And then you want to select all rows that contain that search term - but only display the search term itself, not the whole DivisionName column? Seems rather pointless..... try this:
select N'ዜናገብርኤልስ'
from t_Et_Divisions
where DivisionName LIKE N'%ዜናገብርኤልስ%'
The LIKE searches for rows that contain that value, and since you already know what you want to display, just put that value into the SELECT list ....

SQLite table and column name requirements

I'm wondering what constraints SQLite puts on table and column names when creating a table. The documentation for creating a table says that a table name can't begin with "sqlite_" but what other restrictions are there? Is there a formal definition anywhere of what is valid?
SQLite seems surprisingly accepting as long as the name is quoted. For example...
sqlite> create table 'name with spaces, punctuation & $pecial characters?'(x int);
sqlite> .tables
name with spaces, punctuation & $pecial characters?
If you use brackets or quotes you can use any name and there is no restriction :
create table [--This is a_valid.table+name!?] (x int);
But table names that don't have brackets around them should be any alphanumeric combination that doesn't start with a digit and does not contain any spaces.
You can use underline and $ but you can not use symbols like: + - ? ! * # % ^ & # = / \ : " '
From the sqlite doc,
If you want to use a keyword as a name, you need to quote it. There are four ways of quoting keywords in SQLite:
'keyword' A keyword in single quotes is a string literal.
"keyword" A keyword in double-quotes is an identifier.
[keyword] A keyword enclosed in square brackets is an identifier. This is not standard SQL. This quoting mechanism is used by MS Access and SQL Server and is included in SQLite for compatibility.
`keyword` A keyword enclosed in grave accents (ASCII code 96) is an identifier. This is not standard SQL. This quoting mechanism is used by MySQL and is included in SQLite for compatibility.
So, double quoting the table name and you can use any chars. [tablename] can be used but not a standard SQL.

How to escape a % sign in sqlite?

I do a full text search using LIKE clause and the text can contain a '%'.
What is a good way to search for a % sign in an sqlite database?
I did try
SELECT * FROM table WHERE text_string LIKE '%[%]%'
but that doesn't work in sqlite.
From the SQLite documentation
If the optional ESCAPE clause is present, then the expression following the ESCAPE keyword must evaluate to a string consisting of a single character. This character may be used in the LIKE pattern to include literal percent or underscore characters. The escape character followed by a percent symbol (%), underscore (_), or a second instance of the escape character itself matches a literal percent symbol, underscore, or a single escape character, respectively.
We can achieve same thing with the below query
SELECT * FROM table WHERE instr(text_string, ?)>0
Here :
? => your search word
Example :
You can give text directly like
SELECT * FROM table WHERE instr(text_string, '%')>0
SELECT * FROM table WHERE instr(text_string, '98.9%')>0 etc.
Hope this helps better.

SQLite: which character can be ignored with FTS match in one word

I need to find any special character. If I put it in the middle of a word, SQLite FTS match can ignore it as if it does not exist, e.g.:
Text Body: book's
If my match string is 'books' I need to get result of "book's"..
No problem using porter or simple tokenizer.
I tried many characters for that like: book!s, book?s, book|s, book,s, book:s…, but when searching by match for 'books' no results of these returned.
I don't understand, why?
I am using: Contentless FTS4 Tables, and External Content FTS4 Tables, my text body has many characters in each word, should be changed to ignore it when searching..
I cannot change match query because I do not know where the special character in the word is. Also, I need to leave the original word length equal to the length of FTS Index word to use match info or snippet(); as such, I cannot remove these characters from text body.
The default tokenizers do not ignore punctuation characters but treat them as word separators.
So the text body or match string book's will end up as two words, book and s.
These will never match a single work like books.
To ignore characters like ', you have to install your own custom tokenizer.

Resources