Why are Meteor's Match patterns capitalized? - meteor

Meteor's Match patterns are capitalized, eg. Match.Any, Match.Optional.
This goes against the convention that all uppercase-started functions be called with new operators, eg. as embodied in eslint's new-cap rule with capIsNew true (the default setting).
What convention is Match following here?

Related

Extract mm/dd/yyyy and m/dd/yyyy dates from string in R [duplicate]

My regex pattern looks something like
<xxxx location="file path/level1/level2" xxxx some="xxx">
I am only interested in the part in quotes assigned to location. Shouldn't it be as easy as below without the greedy switch?
/.*location="(.*)".*/
Does not seem to work.
You need to make your regular expression lazy/non-greedy, because by default, "(.*)" will match all of "file path/level1/level2" xxx some="xxx".
Instead you can make your dot-star non-greedy, which will make it match as few characters as possible:
/location="(.*?)"/
Adding a ? on a quantifier (?, * or +) makes it non-greedy.
Note: this is only available in regex engines which implement the Perl 5 extensions (Java, Ruby, Python, etc) but not in "traditional" regex engines (including Awk, sed, grep without -P, etc.).
location="(.*)" will match from the " after location= until the " after some="xxx unless you make it non-greedy.
So you either need .*? (i.e. make it non-greedy by adding ?) or better replace .* with [^"]*.
[^"] Matches any character except for a " <quotation-mark>
More generic: [^abc] - Matches any character except for an a, b or c
How about
.*location="([^"]*)".*
This avoids the unlimited search with .* and will match exactly to the first quote.
Use non-greedy matching, if your engine supports it. Add the ? inside the capture.
/location="(.*?)"/
Use of Lazy quantifiers ? with no global flag is the answer.
Eg,
If you had global flag /g then, it would have matched all the lowest length matches as below.
Here's another way.
Here's the one you want. This is lazy [\s\S]*?
The first item:
[\s\S]*?(?:location="[^"]*")[\s\S]* Replace with: $1
Explaination: https://regex101.com/r/ZcqcUm/2
For completeness, this gets the last one. This is greedy [\s\S]*
The last item:[\s\S]*(?:location="([^"]*)")[\s\S]*
Replace with: $1
Explaination: https://regex101.com/r/LXSPDp/3
There's only 1 difference between these two regular expressions and that is the ?
The other answers here fail to spell out a full solution for regex versions which don't support non-greedy matching. The greedy quantifiers (.*?, .+? etc) are a Perl 5 extension which isn't supported in traditional regular expressions.
If your stopping condition is a single character, the solution is easy; instead of
a(.*?)b
you can match
a[^ab]*b
i.e specify a character class which excludes the starting and ending delimiiters.
In the more general case, you can painstakingly construct an expression like
start(|[^e]|e(|[^n]|n(|[^d])))end
to capture a match between start and the first occurrence of end. Notice how the subexpression with nested parentheses spells out a number of alternatives which between them allow e only if it isn't followed by nd and so forth, and also take care to cover the empty string as one alternative which doesn't match whatever is disallowed at that particular point.
Of course, the correct approach in most cases is to use a proper parser for the format you are trying to parse, but sometimes, maybe one isn't available, or maybe the specialized tool you are using is insisting on a regular expression and nothing else.
Because you are using quantified subpattern and as descried in Perl Doc,
By default, a quantified subpattern is "greedy", that is, it will
match as many times as possible (given a particular starting location)
while still allowing the rest of the pattern to match. If you want it
to match the minimum number of times possible, follow the quantifier
with a "?" . Note that the meanings don't change, just the
"greediness":
*? //Match 0 or more times, not greedily (minimum matches)
+? //Match 1 or more times, not greedily
Thus, to allow your quantified pattern to make minimum match, follow it by ? :
/location="(.*?)"/
import regex
text = 'ask her to call Mary back when she comes back'
p = r'(?i)(?s)call(.*?)back'
for match in regex.finditer(p, str(text)):
print (match.group(1))
Output:
Mary

Input masking in Sqlite

How can I restrict the input of the registration number column to a specific format of AB-78. The first 2 characters must be alphabets and the last two numbers. I tried like [A-Z][A-Z]-[0-9][0-9] but it didn't work in SQLite.
Use the GLOB operator. It supports a limited set of matching patterns. You could add a CHECK constraint in the column definition (e.g. as part of the CREATE TABLE statement) that includes a GLOB expression, similar to
CHECK (column GLOB '[A-Za-z][A-Za-z]-[0-9][0-9]')
GLOB patterns are case sensitive, so I included both ranges of uppercase and lowercase characters. If you need a particular case, then just remove the other range in the character class.
See online docs for more information about LIKE, REGEXP and GLOB. Information on GLOB patterns can be found here or doing a web search. There are many pages with more information. I don't think the built-in GLOB function supports all named character classes.

Are regexs allowed in BNF and EBNF notations?

If I wanted for example to define the Lisp programming language, where a name can include even non-alphanumeric characters, should I list all the usable characters with a notation like:
validchar ::= "a" | "b" | "c" ... "-" | "*" | "$" ... ;
name = validchar, (validchar | digit)+;
Or am I allowed to use regexs, like:
validchar ::= "[^(^)^\s^\d]";
name ::= validchar, (validchar | digit)*;
Or even:
name ::= "[^(^)^\s^\d]", "[^(^)^\s]"*;
This would shorten it a lot, and it would include even characters like ₩, ¥, € and so on, which I can't list but are actually usable.
Whether this is allowed depends on the tool you are using that implements the (E)BNF notation.
Some tools are rather strict and stick to the original definition of (E)BNF, allowing at best Kleene * or + on language tokens. An additional point is that there is no requirement for classic (E)BNF to operate on characters as terminals.
Clearly it is convenient to be able to define some language tokens directly in terms of characters, and one can imagine (as you have) an EBNF in which one can write not only characters as terminals, but also regexes over characters.
Whether the tool you propose to use allows that... depends entirely on the tool. Many tools that process (E)BNF such as YACC are actually designed to work in conjunction with another tool, a "lexer generator" (for YACC, this is called FLEX) that defines character sequences for tokens. With such tool pairs, the (E)BNF tool typically does not allow any mention of characters or regexes over them, but the lexer generator tool explicitly does allow character and regex specifications for tokens.
There are hundreds of (E)BNF and lexer generator tools, each with somewhat (egregiously different) rules. Check the tool documentation.
Or write it the way you want to write it, and build your own (101st) tool.

Pyparsing: the differences between MatchFirst, Or, and oneOf

in Pyparsing, what are the differences between MatchFirst, Or, and oneOf
when there are shared characters in the strings like
word, wording, words
Or(['word', 'wording', 'words'])
MatchFirst(['word', 'wording', 'words'])
oneOf(['word', 'wording', 'words'])
From the online docs (https://pythonhosted.org/pyparsing/)
MatchFirst - If two expressions match, the first one listed is the one that will match.
Or - If two expressions match, the expression that matches the longest string will be used.
oneOf - Helper to quickly define a set of alternative Literals, and makes sure to do longest-first testing when there is a conflict, regardless of the input order, but returns a MatchFirst for best performance.
MatchFirst tests the current parse location with each string in its constructor, stopping at the first one to match.
Or tests the current parse location against all of the strings given in its constructor, and will return the longest match.
oneOf generates a Regex or MatchFirst to match the longest match, by reordering the input list when there are alternatives with common start strings to test the longer string first.
oneOf operates on str understood as space separated strings and can be simplistically defined as
oneOf = lambda xs: Or(Literal(x) for x in xs.split(" "))
While Or operates on expressions - ParseElement instances.
So you can see either oneOf as specialization of Or or Or being a generalization of oneOf.
You can write oneOf('foo bar') as Literal('foo') ^ Literal('bar')
but you can't write every Or expression using oneOf.
MatchFirst is the same as Or except conflict resolution method - Or yields the longest match while MatchFirst returns the first match in definition order.
So
expr = Literal('bar') ^ Words(alphanums)
expr.parseString("barstool").asList() == ["barstool"]
but
expr = Literal('bar') | Words(alphanums)
expr.parseString("barstool").asList() == ["bar"]

Case insensitive token matching

Is it possible to set the grammar to match case insensitively.
so for example a rule:
checkName = 'CHECK' Word;
would match check name as well as CHECK name
Creator of PEGKit here.
The only way to do this currently is to use a Semantic Predicate in a round-about sort of way:
checkName = { MATCHES_IGNORE_CASE(LS(1), #"check") }? Word Word;
Some explanations:
Semantic Predicates are a feature lifted directly from ANTLR. The Semantic Predicate part is the { ... }?. These can be placed anywhere in your grammar rules. They should contain either a single expression or a series of statements ending in a return statement which evaluates to a boolean value. This one contains a single expression. If the expression evaluates to false, matching of the current rule (checkName in this case) will fail. A true value will allow matching to proceed.
MATCHES_IGNORE_CASE(str, regexPattern) is a convenience macro I've defined for your use in Predicates and Actions to do regex matches. It has a case-sensitive friend: MATCHES(str, regexPattern). The second argument is an NSString* regex pattern. Meaning should be obvious.
LS(num) is another convenience macro for your use in Predicates/Actions. It means fetch a Lookahead String and the argument specifies how far to lookahead. So LS(1) means lookahead by 1. In other words, "fetch the string value of the first upcoming token the parser is about to try to match".
Notice that I'm still matching Word twice at the end there. The first Word is necessary for matching 'check' (even though it was already tested in the predicate, it was not matched and consumed). The second Word is for your name or whatever.
Hope that helps.

Resources