How to use glob and find brace brackets in sqlite? - sqlite

I want to use sqlite3 query like this:
select * from Log where Desc glob '*[ _.,:;!?-(){}[]<>''"]OK';
to find records which ends with OK, like
OK
asdasda _OK
asda (OK
dasda [OK
dasda ]OK
but this fails me when i use back bracket in query...glob '*[ []]OK';
Any suggestions?

A comment hidden in the source code says:
Globbing rules:
* Matches any sequence of zero or more characters.
? Matches exactly one character.
[...] Matches one character from the enclosed list of characters.
[^...] Matches one character not in the enclosed list.
With the [...] and [^...] matching, a ] character can be included
in the list by making it the first character after [ or ^. A
range of characters can be specified using -. Example:
[a-z] matches any single lower-case letter. To match a -, make
it the last character in the list.
So, your records can be found with ... glob '*[] _.,:;!?(){}[<>''"-]OK'.

Related

REGEX pattern match in R for Course number

I need to identify matching course number that have xx.3xxxxxx.
These are some examples of the course numbers.
26.3730004
27.0210000
26.3730009
26.7114001
23.9610071
26.0A34430
23.3670005
26.0B05430
I tried many patterns one example I used is the pattern below. It did not get any match.
"[^0-9]{2}\Q.\E3[^0-9]+$"
I tried using grep and grepl. I actually need the code to return indexes.
This code shows my attempt to tag the rows that have matches.
Teacher$virtual[
which(
grepl("[^0-9]{2}\\Q.\\E3[^0-9]+$",Teacher$CourseNumber))]
<- "1"
I need to remove any row from my dataframe that have the course number with that pattern. XX.3XXXXXX
But, my code did not find any match. Can you please help me?
You should use
grepl("^[0-9]{2}\\.3", Teacher$CourseNumber)
See the regex graph:
Details:
^ - start of a string
[0-9]{2} - two digits
\\. - a dot (note that a regex escape is a literal backslash, but inside a string literal, "...", a single backslash is used to form string escape sequences, hence the backslash must be double to obtain a literal backslash char necessary for a regex escape)
3 - a 3 char.
NOTE: If you want to use in-pattern quoting with \Q and \E (in between which all chars are treated literally) you need to use PCRE regex, add perl=TRUE and use
grepl("^[0-9]{2}\\Q.\\E3", Teacher$CourseNumber, perl=TRUE)
Now, the dot is treated as a literal dot, not a . metacharacter that matches any char but a line break char (in a PCRE regex, . does not match line break chars by default).
Here, this simple expression would likely cover that:
^[0-9]{2}\.[3].+$
which has a [3] boundary right after the .. It would probably work without start and end anchors:
[0-9]{2}\.[3].+
Demo
We can add or reduce the boundaries, if it'd be necessary.

How to split a string by dashes outside of square brackets

I would like to split strings like the following:
x <- "abc-1230-xyz-[def-ghu-jkl---]-[adsasa7asda12]-s-[klas-bst-asdas foo]"
by dash (-) on the condition that those dashes must not be contained inside a pair of []. The expected result would be
c("abc", "1230", "xyz", "[def-ghu-jkl---]", "[adsasa7asda12]", "s",
"[klas-bst-asdas foo]")
Notes:
There is no nesting of square brackets inside each other.
The square brackets can contain any characters / numbers / symbols except square brackets.
The other parts of the string are also variable so that we can only assume that we split by - whenever it's not inside [].
There's a similar question for python (How to split a string by commas positioned outside of parenthesis?) but I haven't yet been able to accurately adjust that to my scenario.
You could use look ahead to verify that there is no ] following sooner than a [:
-(?![^[]*\])
So in R:
strsplit(x, "-(?![^[]*\\])", perl=TRUE)
Explanation:
-: match the hyphen
(?! ): negative look ahead: if that part is found after the previously matched hyphen, it invalidates the match of the hyphen.
[^[]: match any character that is not a [
*: match any number of the previous
\]: match a literal ]. If this matches, it means we found a ] before finding a [. As all this happens in a negative look ahead, a match here means the hyphen is not a match. Note that a ] is a special character in regular expressions, so it must be escaped with a backslash (although it does work without escape, as the engine knows there is no matching [ preceding it -- but I prefer to be clear about it being a literal). And as backslashes have a special meaning in string literals (they also denote an escape), that backslash itself must be escaped again in this string, so it appears as \\].
Instead of splitting, extract the parts:
library(stringr)
str_extract_all(x, "(\\[[^\\[]*\\]|[^-])+")
I am not familiar with r language, but I believe it can do regex based search and replace. Instead of struggling with one single regex split function, I would go in 3 steps:
replace - in all [....] parts by a invisible char, like \x99
split by -
for each element in the above split result(array/list), replace \x99 back to -
For the first step, you can find the parts by \[[^]]

Finding words in a file that contains the word file and does not contain hypthens

I am trying to find words in a file that contains the world file and does not contain hypthens. It looks correct to me but my output shows all the words with a hypthen and word file. My path name is/folder.file.txt
.
cat /folder/file.txt | grep file[!-]*
! isn't the right operator for negation. You need ^.
file[!-]*
will match any string containing the word 'file' and zero or more instances of '!' or '-'.
So basically - anything with the word 'file' in it. If you want to negate a character class, you need to use ^. But the * then allows for zero of the 'not patterns'.
If the dash is immediately after the word file then:
file[^-]
will match:
file1243
somefilefilea
but not:
file-1234
I think what you may be missing from your pattern is that * allows you to ignore part of the pattern.
^file[^-]*$
might do what you're after?
https://www.regex101.com/ will let you test regular expressions.

Regular expression to allow dash sign

I currently need to allow a "-" sign in this regular expression ^[a-zA-Z0-9]*$.
You could use a regex like this:
^[a-z\d]+[-a-z\d]+[a-z\d]+$
Working demo
The idea is to use insensitive flag to avoid having A-Za-z and use only a-z. And also use \d that's the shortcut for 0-9.
So, basically the regex is compound of three parts:
^[a-z\d]+ ---> Start with alphanumeric characters
[-a-z\d]+ ---> can continue with alphanumeric characters or dashes
[a-z\d]+$ ---> End with alphanumeric characters
Simply add it as the first character after the opening bracket: ^[-a-zA-Z0-9]*$
Or, to match one or more of letters/numbers with a dash in between: ^[a-zA-Z0-9]+-[a-zA-Z0-9]+$
Hyphen can be included immediately after the open bracket [ or before the closing bracket
] in the character class. You should not include in the middle of the character class, otherwise it will treat as range characters and some Regex engine might not work also.
In your case both are valid solutions
(^[-a-zA-Z0-9]*$) - Starting of the Char class
(^[a-zA-Z0-9-]*$) - End of the Char class
Demo:
http://regex101.com/r/yP3sH7/2

SQLite query - using [] in SQLite queries

I would like to know if it is possible to use [] in SQLite query as we used to in Access and other DB.
e.g. SELECT * FROM mytable WHERE fwords like '%b[e,i,a]d%'
this will retrieve all rows have fwords containing bad, bed, bid
Thanks a lot
From http://www.sqlite.org/lang_expr.html:
The LIKE operator does a pattern matching comparison. The operand to the right of the LIKE operator contains the pattern and the left hand operand contains the string to match against the pattern. A percent symbol ("%") in the LIKE pattern matches any sequence of zero or more characters in the string. An underscore ("_") in the LIKE pattern matches any single character in the string. Any other character matches itself or its lower/upper case equivalent (i.e. case-insensitive matching).
Does that help?
You can have a look at the regex section here.

Resources