Retrieve distinct tokens from SQLite3 column - sqlite

I want to add a new feature to my bookmarking utility, Buku: retrieve all distinct tags.
Buku uses SQLite3.
A bookmark entry can have multiple tags separated by commas (,) in the same column tags.
Instead of retrieving the distinct values from column tags and then parsing them, is there any way I can tokenize the tags by comma and retrieve the distinct tags?
Any help is much appreciated.

There isn't function 'split' in sqlite3 database. Only instr(X, Y) which returns position of only first occurrence. And there is function substr. If number of tags in field is constant value you can create complicated query to split you string into rows and then select distinct from them.
So answer is no, don't try to do it by database engine. You should change structure or parse values after retrieving from database.

Related

Lossless SQLite FTS5 search of a substring

Using a FTS5 virtual table returns nothing for postfix searches.
It only can search for the entire word tokens, or for the prefixes of the word tokens if I append * to the search.
For example, it does not find qwerty.png row, if I search for werty.
CREATE TABLE IF NOT EXISTS files (name TEXT, id INTEGER);
INSERT INTO files (name, id) VALUES ('qwerty.png', 1), ('asdfgh.png', 2);
CREATE VIRTUAL TABLE IF NOT EXISTS names USING FTS5(name);
INSERT INTO names (name) SELECT name FROM files;
SELECT *
FROM names
WHERE name MATCH 'werty';
It only works for prefix searches (qwerty, qwer*, qwe*, ...).
I can't use * at the start of the search (*werty), since it produces an error.
Is possibly to make the indexed text search working as if I would use
SELECT *
FROM names
WHERE name like '%wert%';
?
I just want to have the fast search for a substring without the full table scan.
Perhaps try the experimental trigram tokenizer
When using the trigram tokenizer, a query or phrase token may match any sequence of characters within a row, not just a complete token.

Counting Columns in ColdFusion's QoQ

I have:
<cfspreadsheet action="read" src="#Trim(PathToExcelFile)#" query="Data">
How do I count the total column in my "Data" query using ColdFusion Query of Query? I need to count whether my users has used the corrent excel file format before inserting into my DB.
I'm using Oracle 11g and I can not do:
Select * From Data Where rownum < 2
If I can do that then I can create an array and count the columns but running that script using results in error. The error saying that there is no column name Rownum. Oracle does not allow me to use select top 1.
I don't want to loop over 5000+ record to just count the total column of one row. I appreciate any help, thank you
ColdFusion adds a few additional variables to it's query results. One of them is named `columnList' and contains a comma-separated list of the query columns that were returned.
From the documentation here
From that you should be able to count the number of columns easily. #listlen(Data.columnList)# as one example.

Why does SQLite full-text search (FTS4) treat angle brackets differently in a compound search?

I have an SQLite database using FTS4. It is used to store emails with message id's of the form:
Searching for messages using the FTS MATCH syntax, I get a result from:
SELECT rowid FROM emails WHERE emails MATCH '<8200#comms.io>'
This returns the correct row. But when I try to find multiple emails, I get an empty response:
SELECT rowid FROM emails WHERE emails MATCH '<8200#comms.io> OR <8188#comms.io>'
Strangely though, I can search without the angle bracket characters. This returns both rows:
SELECT rowid FROM emails WHERE emails MATCH '8200#comms.io OR 8188#comms.io'
This even though the angle brackets are present in the stored columns. I can find no mention that these are special characters in SQLite, and without the 'OR', the single-term search works fine.
Why are these characters treated differently in my compound search?
The default (simple) tokenizer reads alphanumerical characters and treats all others as word separators to be ignored.
So when searching for a message ID, you have to actually search for a phrase with multiple words (8200, comms, and io).
If you want to treat the entire message ID as a word, you have to write a custom tokenizer.

Selecting empty character column in SQLite

I want to select some rows from a SQLite table, and add an empty character column at the same time, but I get an error. The statement is SELECT firstname, SPACE(100) AS mytext FROM Customers, and the error message is "No such function: space".
I can run the same command in SQL-server without any probems, and in SQLite I can select additional numeric columns without problems (eg. SELECT firstname, 8 AS newfield ...), but not character columns.
Any help would be appreciated.
Regards, Alan
Functions are not standard across database engines; some will be the same, but most are not. A complete list of standard functions is here http://www.sqlite.org/lang_corefunc.html. You can also create custom functions in C, C#, or whatever you're using.
There is no built in version of SPACE. You need to create a custom function or use a string literal.

sqlite Query optimisation

The query
SELECT * FROM Table WHERE Path LIKE 'geo-Africa-Egypt-%'
can be optimized as:
SELECT * FROM Table WHERE Path >= 'geo-Africa-Egypt-' AND Path < 'geo-Africa-Egypt-zzz'
But how can be this done:
select * from foodDb where Food LIKE '%apples%";
how this can be optimized?
One option is redundant data. If you're querying a lot for some fixed set of strings occuring in the middle of some column, add another column that contains the information whether a particular string can be found in the other column.
Another option, for arbitrary but still tokenizable strings is to create a dictionary table where you have the tokens (e.g. apples) and foreign key references to the actual table where the token occurs.
In general, sqlite is by design not very good at full text searches.
It would surprise me if it was faster, but you could try GLOB instead of LIKE and compare;
SELECT * FROM foodDb WHERE Food GLOB '*apples*';

Resources