How performant is the SQLite3 REGEXP operator?
For simplicity, assume a simple table with a single column pattern and an index
CREATE TABLE `foobar` (`pattern` TEXT);
CREATE UNIQUE INDEX `foobar_index` ON `foobar`(`pattern`);
and a query like
SELECT * FROM `foobar` WHERE `pattern` REGEXP 'foo.*'
I have been trying to compare and understand the output from EXPLAIN and it seems to be similar to using LIKE except it will be using regexp for matching. However, I am not fully sure how to read the output from EXPLAIN and I'm not getting a grasp of how performant it will be.
I understand it will be slow compared to a indexed WHERE `pattern` = 'foo' query but is it slower/similar to LIKE?
sqlite does not optimize WHERE ... REGEXP ... to use indexes. x REGEXP y is simply a function call; it's equivalent to regexp(x,y). Also note that not all installations of sqlite have a regexp function defined so using it (or the REGEXP operator) is not very portable. LIKE/GLOB on the other hand can take advantage of indexes for prefix queries provided that some additional conditions are met:
The right-hand side of the LIKE or GLOB must be either a string literal or a parameter bound to a string literal that does not begin with a wildcard character.
It must not be possible to make the LIKE or GLOB operator true by having a numeric value (instead of a string or blob) on the left-hand side. This means that either:
the left-hand side of the LIKE or GLOB operator is the name of an indexed column with TEXT affinity, or
the right-hand side pattern argument does not begin with a minus sign ("-") or a digit.
This constraint arises from the fact that numbers do not sort in lexicographical order. For example: 9<10 but '9'>'10'.
The built-in functions used to implement LIKE and GLOB must not have been overloaded using the sqlite3_create_function() API.
For the GLOB operator, the column must be indexed using the built-in BINARY collating sequence.
For the LIKE operator, if case_sensitive_like mode is enabled then the column must indexed using BINARY collating sequence, or if case_sensitive_like mode is disabled then the column must indexed using built-in NOCASE collating sequence.
If the ESCAPE option is used, the ESCAPE character must be ASCII, or a single-byte character in UTF-8.
Related
In SQLite you can use named parameters in statements, like this (Python example):
cur.execute("insert into lang values (:foo, :bar)", {'foo': 'a', 'bar': 2})
Is there any way to have parameter names containing spaces? I.e:
cur.execute("insert into lang values (:'foo bar')", {'foo bar': 'a'})
The documentation suggests not but you never know.
Apparently for the #AAA form you can:
The identifier name in this case can include one or more occurrences of "::" and a suffix enclosed in "(...)" containing any text at all.
But that doesn't let you have an arbitrary name since the brackets are still part of the name. So the answer appears to be no.
I have a task to translate some Teradata scripts to BigQuery SQL. However, I can't find what the syntax with pound sign in the name of the alias means.
SELECT
A AS SOME_COLUMM_1
,B AS SOME_COLUMN_2
,C AS SOME_COLUMN_3# /* <------- HERE */
,COUNT(*) AS E FROM
SOME_DB.SOME_TABLE;
There's no meaning, '#', '$' and '_' are simply allowed characters in an object name besides 'a'-'z' and '0'-'9'.
If BigQuery doesn't support SOME_COLUMN_3# as object name, you can either change it or double quote it: "SOME_COLUMN_3#"
Double quoted names can include almost any character and allow using of reserved keywords as names like a table named "table".
Caution: In Standard SQL double quoted names are case sensitive, but not in Teradata, e.g. "a" and "A" are different names in Standard SQL, but the same in Teradata.
How can I restrict the input of the registration number column to a specific format of AB-78. The first 2 characters must be alphabets and the last two numbers. I tried like [A-Z][A-Z]-[0-9][0-9] but it didn't work in SQLite.
Use the GLOB operator. It supports a limited set of matching patterns. You could add a CHECK constraint in the column definition (e.g. as part of the CREATE TABLE statement) that includes a GLOB expression, similar to
CHECK (column GLOB '[A-Za-z][A-Za-z]-[0-9][0-9]')
GLOB patterns are case sensitive, so I included both ranges of uppercase and lowercase characters. If you need a particular case, then just remove the other range in the character class.
See online docs for more information about LIKE, REGEXP and GLOB. Information on GLOB patterns can be found here or doing a web search. There are many pages with more information. I don't think the built-in GLOB function supports all named character classes.
I would like to query an SQLite table that contains directory paths to find all the paths under some hierarchy. Here's an example of the contents of the column:
/alpha/papa/
/alpha/papa/tango/
/alpha/quebec/
/bravo/papa/
/bravo/papa/uniform/
/charlie/quebec/tango/
If I search for everything under /bravo/papa/, I would like to get:
/bravo/papa/
/bravo/papa/uniform/
I am currently trying to do this like so (see below for the long story of why I can't use more simple methods):
SELECT * FROM Files WHERE Path >= '/bravo/papa/' AND Path < '/bravo/papa0';
This works. It looks a bit weird, but it works for this example. '0' is the unicode code point 1 greater than '/'. When ordered lexicographically, all the paths starting with '/bravo/papa/' compare greater than it and less than 'bravo/papa0'. However, in my tests, I find that this breaks down when we try this:
SELECT * FROM Files WHERE Path >= '/' AND Path < '0';
This returns no results, but it should return every row. As far as I can tell, the problem is that SQLite is treating '0' as a number, not a string. If I use '0Z' instead of '0', for example, I do get results, but I introduce a risk of getting false positives. (For example, if there actually was an entry '0'.)
The simple version of my question is: is there some way to get SQLite to treat '0' in such a query as the length-1 string containing the unicode character '0' (which should sort strings such as '!', '*' and '/', but before '1', '=' and 'A') instead of the integer 0 (which SQLite sorts before all strings)?
I think in this case I can actually get away with special-casing a search for everything under '/', since all my entries will always start with '/', but I'd really like to know how to avoid this sort of thing in general, as it's unpleasantly surprising in all the same ways as Javascript's "==" operator.
First approach
A more natural approach would be to use the LIKE or GLOB operator. For example:
SELECT * FROM Files WHERE Path LIKE #prefix || '%';
But I want to support all valid path characters, so I would need to use ESCAPE for the '_' and '%' symbols. Apparently this prevents SQLite from using an index on Path. (See http://www.sqlite.org/optoverview.html#like_opt ) I really want to be able to benefit from an index here, and it sounds like that's impossible using either LIKE or GLOB unless I can guarantee that none of their special characters will occur in the directory name, and POSIX allows anything other than NUL and '/', even GLOB's '*' and '?' characters.
I'm providing this for context. I'm interested in other approaches to solve the underlying problem, but I'd prefer to accept an answer that directly addresses the ambiguity of strings-that-look-like-numbers in SQLite.
Similar questions
How do I prevent sqlite from evaluating a string as a math expression?
In that question, the values weren't quoted. I get these results even when the values are quoted or passed in as parameters.
EDIT - See my answer below. The column was created with the invalid type "STRING", which SQLite treated as NUMERIC.
* Groan *. The column had NUMERIC affinity because it had accidentally been specified as "STRING" instead of "TEXT". Since SQLite didn't recognize the type name, it made it NUMERIC, and because SQLite doesn't enforce column types, everything else worked as expected, except that any time a number-like string is inserted into that column it is converted into a numeric type.
I would like to know if it is possible to use [] in SQLite query as we used to in Access and other DB.
e.g. SELECT * FROM mytable WHERE fwords like '%b[e,i,a]d%'
this will retrieve all rows have fwords containing bad, bed, bid
Thanks a lot
From http://www.sqlite.org/lang_expr.html:
The LIKE operator does a pattern matching comparison. The operand to the right of the LIKE operator contains the pattern and the left hand operand contains the string to match against the pattern. A percent symbol ("%") in the LIKE pattern matches any sequence of zero or more characters in the string. An underscore ("_") in the LIKE pattern matches any single character in the string. Any other character matches itself or its lower/upper case equivalent (i.e. case-insensitive matching).
Does that help?
You can have a look at the regex section here.