Can someone give me an example/explanation what this regular expression does:
(?![#$])
This is part of <%(?![#$])(([^%]*)%)*?> which is what ASP.NET uses to parse server-side code blocks. I understand the second part of the expression but not the first.
I checked the documentation and found (?! ...) means a zero-width negative lookahead but I'm not entirely sure I understand what that means. Any input I tried so far that looks like <% ... %> seems to work - I wonder why this first sub-expression is even there.
Edit:
I came up with this expression for picking up ASP.NET expressions: <%.+?%> then I found the one Microsoft made (the above full expression in question). I'm trying to understand why they chose that particular expression when mine seems a lot simpler. (I'm trying to see if my expression ignores certain boundary conditions that the MS one doesn't.)
It's a negative lookahead assertion that matches if the next character is not # or $, but doesn't consume it.
It's very simlar to the negative character class [^#$] except that the negative character class also consumes the character, preventing it from being matched by the rest of the expression.
To see the difference consider matching <%test%>.
The expression <%(?![#$])(([^%]*)%)*?> captures test%. (rubular)
The expression <%[^#$](([^%]*)%)*?> captures est% because the t was consumed by the negative character class. (rubular)
Related
The Context
I am climbing the Nearley learning curve and trying to write a grammar for a search query parser.
The Goal
I would like to write grammar that is able to parse a querystring that contains boolean operators (e.g. AND, OR, NOT). Lets use AND for this question as a trivial case.
For instance, the grammar should recognize these example strings as valid:
pants
pants AND socks
jumping jacks
The Attempt
My naive attempt looks something like this:
query ->
statement
| statement "AND" statement
statement -> .:+
The Problem
The above grammar attempt is ambiguous because .:+ will match literally any string.
What I really want is for the first condition to match any string that does not contain AND in it. Once "AND" appears I want to enter the second condition only.
The Question
How can I detect these two distinct cases without having ambiguous grammar?
I am worried I'm missing something fundamental; I can imagine a ton of use cases where we want arbitrary text split up by known operators.
Yeah, if you've got an escape hatch that could be literally anything, you're going to have a problem.
Somewhere you're going to want to define what your base set of tokens are, at least something like \S+ and then how those tokens can be composed.
The place I'd typically start for a parser is trying to figure out where recursion is accounted for in the parser, and what approach to parsing the lib you're relying on takes.
Looks like Nearley is an Earley parser, and as the wikipedia entry for them notes, they're efficient for left-recursion.
This is just hazarding a guess, but something like this might get you to conjunction at least.
CONJUNCTION -> AND | OR
STATEMENT -> TOKENS | (TOKENS CONJUNCTION STATEMENT)
TOKENS -> [^()]+
A structure like this should be unambiguous and bans parentheses in tokens, unless they're surrounded by double quotes.
I have an ifElse Statement which can be of the following two types
a) ifElse(condition, expression_bool_result, expression_bool_result)
whereas expression_bool_result may either be TRUE/FALSE, the result of and(), or(), ==, !=.... or further ifElse
b) ifElse(condition, expression_arith_result, expression_arith_result)
whereas expression_arith_result may either be any number, the result of calculations of further functions returning a number... (or further ifElse)
Since I am new to javacc, I would like to ask you how a production could look like which allows the parser for a clear decision.
Currently I get the warning
Warning: Choice conflict involving two expansions at
line 824, column 5 and line 825, column 5 respectively.
A common prefix is: "ifElse" "("
Consider using a lookahead of 3 or more for earlier expansion.
which - as far as I can tell - implies that my grammer (regarding ifelse) is ambiguous.
If there is no way to write it unambiguously, how could the suggested lookahead look like?
Thanks for your feedback in advance!
No fixed amount of lookahead could possibly resolve this ambiguity in all cases. You could have an arbitrarily long stream of tokens that form a valid expression_arith_result - but is then followed by a comparison operator and another arithmetic value, thus turning it into an expression_bool_result.
The solution would be to have a single ifElse statement, that takes two arbitrary expressions. The required agreement in type between the two expressions would be a matter of semantics, not grammar.
Jason's answer is correct in that you can't resolve the choice with a fixed length of lookahead. However JavaCC does not limit you to fixed length of lookahead. So you can do the following.
void IfExpression() :
{ }
{ LOOKAHEAD( <IFELSE> "(" Condition() "," BooleanExpression() )
BooleanIfExpression()
|
ArithmeticIfExpression()
}
How can i write a substring function in ocaml without using any assignments lists and iterations, only recursions? i can only use string.length.
i tried so far is
let substring s s2 start stop=
if(start < stop) then
substring s s2 (start+1) stop
else s2;;
but obviously it is wrong, problem is that how can i pass the string that is being built gradually with recursive calls?
This feels like a homework problem that is intended to teach you think think about recursion. For me it would be easier to think about the recursion part if you decide on the basic operations you're going to use. You can't use assignments, lists, or iterations, okay. You need to extract parts of your input string somehow, but you obviously can't use the built-in substring function to do this, that would defeat the purpose of the exercise. The only other operation I can think of is the one that extracts a single character from a string:
# "abcd".[2];;
- : char = 'c'
You also need a way to add a character to a string, giving a longer string. But you're not allowed to use assignment to do this. It seems to me you're going to have to use String.make to translate your character to a string:
# String.make 1 'a';;
- : string = "a"
Then you can concatenate two strings using the ^ operator:
# "abc" ^ "def"
- : string = "abcdef"
Are you allowed to use these three operations? If so, you can start thinking about the recursion part of the substring problem. If not, then I probably don't understand the problem well enough yet to give advice. (Or maybe whoever set up the restrictions didn't expect you to have to calculate substrings? Usually the restrictions are also a kind of hint as to how you should proceed.)
Moving on to your specific question. In beginning FP programming, you don't generally want to pass the answer down to recursive calls. You want to pass a smaller problem down to the recursive call, and get the answer back from it. For the substring problem, an example of a smaller problem is to ask for the substring that starts one character further along in the containing string, and that is one character shorter.
(Later on, you might want to pass partial answers down to your recursive calls in order to get tail-recursive behavior. I say don't worry about it for now.)
Now I can't give you the answer to this, Partly because it's your homework, and partly because it's been 3 years since I've touched OCaml syntax, but I could try to help you along.
Now the Basic principle behind recursion is to break a problem down into smaller versions of itself.
You don't pass the string that is slowly being built up, instead use your recursive function to generate a string that is almost built up except for a single character, and then you add that character to the end of the string.
I'm trying to create a validator for a string, that may contain 1-N words, which a separated with 1 whitespace (spaces only between words). I'm a newbie in a regex, so I feel a bit confused, cause my expression seem to be correct:
^[[a-zA-Z]+\s{1}]{0,}[a-zA-Z]+$
What am I doing wrong here? (it accepts only 2 words .. but I want it to accept 1+ words)
Any help is greatly appreciated :)
As often happens with someone beginning a new programming language or syntax, you're close, but not quite! The ^ and $ anchors are being used correctly, and the character classes [a-zA-Z] will match only letters (sounds right to me), but your repetition is a little off, and your grouping is not what you think it is - which is your primary problem.
^[[a-zA-Z]+\s{1}]{0,}[a-zA-Z]+$
^ ^^^^^^^^
a bbbacccc
It only matches two words because you effectively don't have any group repetition; this is because you don't really have any groups - only character classes. The simplest fix is to change the first [ and its matching end brace (marked by a's in the listing above) to parentheses:
^([a-zA-Z]+\s{1}){0,}[a-zA-Z]+$
This single change will make it work the way you expect! However, there a few recommendations and considerations I'd like to make.
First, for readability and code maintenance, use the single character repetition operators instead of repetition braces wherever possible. * repeats zero or more times, + repeats one or more times, and ? repeats 0 or one times (AKA optional). Your repetition curly braces are syntactically correct, and do what you intend them to, but one (marked by b's above) should be removed because it is redundant, and the other (marked by c's above) should be shortened to an asterisk *, as they have exactly the same meaning:
^([a-zA-Z]+\s)*[a-zA-z]+$
Second, I would recommend considering (depending upon your application requirements) the \w shorthand character class instead of the [a-zA-Z] character class, with the following considerations:
it matches both upper and lowercase letters
it does match more than letters (it matches digits 0-9 and the underscore as well)
it can often be configured to match non-English (unicode) letters for multi-lingual input
If any of these are unnecessary or undesirable, then you're on the right track!
On a side note, the character combination \b is a word-boundary assertion and is not needed for your case, as you will already begin and end where there are letters and letters only!
As for learning more about regular expressions, I would recommend Regular-Expressions.info, which has a wealth of info about regexes and the inner workings and quirks of the various implementations. I also use a tool called RegexBuddy to test and debug expressions.
I have ValidationRegularExpression="[0-9]" which only allows a single character. How do I make it allow between (and including) 1 and 7 digits? I tried [0-9]{1-7} but it didn't work.
You got the syntax almost correct: [0-9]{1,7}.
You can make your solution a bit more elegant (and culture-sensitive) by replacing [0-9] with the generic character group "decimal digit": \d (remember that other languages might use different characters for digits than 0-9).
And here's the documentation for future reference:
.NET Framework Regular Expressions
If you want to avoid leading zeros, you can use this:
^(?!0\d)\d{1,7}$
The first part is a negative lookahead assertion, that checks if there is a 0 followed by a number in the string. If so no match.
Check online here: http://regexr.com?2thtr