What's the difference between LIKE and GLOB in SQLite? - sqlite

What the difference in the following to query ?
FROM COMPANY WHERE ADDRESS GLOB '*-*';
FROM COMPANY WHERE ADDRESS LIKE '%-%';
I know unlike LIKE operator, GLOB is case sensitive. Is it the only difference ?

The documentation says:
The GLOB operator is similar to LIKE but uses the Unix file globbing syntax for its wildcards. Also, GLOB is case sensitive, unlike LIKE.
And that's it.

Other difference that GLOB you can use it as regular expression
i.e. : to select fields which end with number use GLOB '*[0-9]'
to select fields which doesn't contain any number use GLOB '[^0-9]

Related

Extract mm/dd/yyyy and m/dd/yyyy dates from string in R [duplicate]

My regex pattern looks something like
<xxxx location="file path/level1/level2" xxxx some="xxx">
I am only interested in the part in quotes assigned to location. Shouldn't it be as easy as below without the greedy switch?
/.*location="(.*)".*/
Does not seem to work.
You need to make your regular expression lazy/non-greedy, because by default, "(.*)" will match all of "file path/level1/level2" xxx some="xxx".
Instead you can make your dot-star non-greedy, which will make it match as few characters as possible:
/location="(.*?)"/
Adding a ? on a quantifier (?, * or +) makes it non-greedy.
Note: this is only available in regex engines which implement the Perl 5 extensions (Java, Ruby, Python, etc) but not in "traditional" regex engines (including Awk, sed, grep without -P, etc.).
location="(.*)" will match from the " after location= until the " after some="xxx unless you make it non-greedy.
So you either need .*? (i.e. make it non-greedy by adding ?) or better replace .* with [^"]*.
[^"] Matches any character except for a " <quotation-mark>
More generic: [^abc] - Matches any character except for an a, b or c
How about
.*location="([^"]*)".*
This avoids the unlimited search with .* and will match exactly to the first quote.
Use non-greedy matching, if your engine supports it. Add the ? inside the capture.
/location="(.*?)"/
Use of Lazy quantifiers ? with no global flag is the answer.
Eg,
If you had global flag /g then, it would have matched all the lowest length matches as below.
Here's another way.
Here's the one you want. This is lazy [\s\S]*?
The first item:
[\s\S]*?(?:location="[^"]*")[\s\S]* Replace with: $1
Explaination: https://regex101.com/r/ZcqcUm/2
For completeness, this gets the last one. This is greedy [\s\S]*
The last item:[\s\S]*(?:location="([^"]*)")[\s\S]*
Replace with: $1
Explaination: https://regex101.com/r/LXSPDp/3
There's only 1 difference between these two regular expressions and that is the ?
The other answers here fail to spell out a full solution for regex versions which don't support non-greedy matching. The greedy quantifiers (.*?, .+? etc) are a Perl 5 extension which isn't supported in traditional regular expressions.
If your stopping condition is a single character, the solution is easy; instead of
a(.*?)b
you can match
a[^ab]*b
i.e specify a character class which excludes the starting and ending delimiiters.
In the more general case, you can painstakingly construct an expression like
start(|[^e]|e(|[^n]|n(|[^d])))end
to capture a match between start and the first occurrence of end. Notice how the subexpression with nested parentheses spells out a number of alternatives which between them allow e only if it isn't followed by nd and so forth, and also take care to cover the empty string as one alternative which doesn't match whatever is disallowed at that particular point.
Of course, the correct approach in most cases is to use a proper parser for the format you are trying to parse, but sometimes, maybe one isn't available, or maybe the specialized tool you are using is insisting on a regular expression and nothing else.
Because you are using quantified subpattern and as descried in Perl Doc,
By default, a quantified subpattern is "greedy", that is, it will
match as many times as possible (given a particular starting location)
while still allowing the rest of the pattern to match. If you want it
to match the minimum number of times possible, follow the quantifier
with a "?" . Note that the meanings don't change, just the
"greediness":
*? //Match 0 or more times, not greedily (minimum matches)
+? //Match 1 or more times, not greedily
Thus, to allow your quantified pattern to make minimum match, follow it by ? :
/location="(.*?)"/
import regex
text = 'ask her to call Mary back when she comes back'
p = r'(?i)(?s)call(.*?)back'
for match in regex.finditer(p, str(text)):
print (match.group(1))
Output:
Mary

Input masking in Sqlite

How can I restrict the input of the registration number column to a specific format of AB-78. The first 2 characters must be alphabets and the last two numbers. I tried like [A-Z][A-Z]-[0-9][0-9] but it didn't work in SQLite.
Use the GLOB operator. It supports a limited set of matching patterns. You could add a CHECK constraint in the column definition (e.g. as part of the CREATE TABLE statement) that includes a GLOB expression, similar to
CHECK (column GLOB '[A-Za-z][A-Za-z]-[0-9][0-9]')
GLOB patterns are case sensitive, so I included both ranges of uppercase and lowercase characters. If you need a particular case, then just remove the other range in the character class.
See online docs for more information about LIKE, REGEXP and GLOB. Information on GLOB patterns can be found here or doing a web search. There are many pages with more information. I don't think the built-in GLOB function supports all named character classes.

SQLite3 regexp performance

How performant is the SQLite3 REGEXP operator?
For simplicity, assume a simple table with a single column pattern and an index
CREATE TABLE `foobar` (`pattern` TEXT);
CREATE UNIQUE INDEX `foobar_index` ON `foobar`(`pattern`);
and a query like
SELECT * FROM `foobar` WHERE `pattern` REGEXP 'foo.*'
I have been trying to compare and understand the output from EXPLAIN and it seems to be similar to using LIKE except it will be using regexp for matching. However, I am not fully sure how to read the output from EXPLAIN and I'm not getting a grasp of how performant it will be.
I understand it will be slow compared to a indexed WHERE `pattern` = 'foo' query but is it slower/similar to LIKE?
sqlite does not optimize WHERE ... REGEXP ... to use indexes. x REGEXP y is simply a function call; it's equivalent to regexp(x,y). Also note that not all installations of sqlite have a regexp function defined so using it (or the REGEXP operator) is not very portable. LIKE/GLOB on the other hand can take advantage of indexes for prefix queries provided that some additional conditions are met:
The right-hand side of the LIKE or GLOB must be either a string literal or a parameter bound to a string literal that does not begin with a wildcard character.
It must not be possible to make the LIKE or GLOB operator true by having a numeric value (instead of a string or blob) on the left-hand side. This means that either:
the left-hand side of the LIKE or GLOB operator is the name of an indexed column with TEXT affinity, or
the right-hand side pattern argument does not begin with a minus sign ("-") or a digit.
This constraint arises from the fact that numbers do not sort in lexicographical order. For example: 9<10 but '9'>'10'.
The built-in functions used to implement LIKE and GLOB must not have been overloaded using the sqlite3_create_function() API.
For the GLOB operator, the column must be indexed using the built-in BINARY collating sequence.
For the LIKE operator, if case_sensitive_like mode is enabled then the column must indexed using BINARY collating sequence, or if case_sensitive_like mode is disabled then the column must indexed using built-in NOCASE collating sequence.
If the ESCAPE option is used, the ESCAPE character must be ASCII, or a single-byte character in UTF-8.

How to use "glob" operator as case insensitive

I am using glob operator with "?" wildcharater.The problem is - it is case sensitive.
So suppose I want to search for "Hola", then below query does not work.
select * from tableName where columnName glob 'ho?a';
I can use LOWER or UPPER keywords with columnName , but then it also it fails for the text which is a combination of lower and upper case letters.
Please give your inputs.
GLOB is case sensitive by design.
If you want case insensitive matching, use LIKE, with _ matching a single character:
select * from tableName where columnName like 'ho_a';
GLOB supports character classes:
SELECT * FROM tableName WHERE columnName GLOB '[hH][oO]?[aA]';
However, using LIKE would be easier, unless you actually need to use character classes in some other part of the pattern.

Using curly brackets({}) for REGEX in drupal db_query

I have a where clause in my query like this "WHERE sth REGEXP '[0-9]{5,10}'"
when I run this query in phpmyadmin it returns all matched records but in drupal it has no result.I think it's because drupal assumes everything like "{sth}" as a table.
how can I solve this problem?
Thanks
Your theory is correct.
Curly brackets used as repetition quantifier in regexes are removed as any other curly bracket. Pass the regex as an argument to db_query() instead like this:
db_query('SELECT name from {users} WHERE std RLIKE "%s"', '[0-9]{5,10}');
(I've had to guess at the rest of your query.)

Resources