JavaCC - choice based on return type? - javacc

I have an ifElse Statement which can be of the following two types
a) ifElse(condition, expression_bool_result, expression_bool_result)
whereas expression_bool_result may either be TRUE/FALSE, the result of and(), or(), ==, !=.... or further ifElse
b) ifElse(condition, expression_arith_result, expression_arith_result)
whereas expression_arith_result may either be any number, the result of calculations of further functions returning a number... (or further ifElse)
Since I am new to javacc, I would like to ask you how a production could look like which allows the parser for a clear decision.
Currently I get the warning
Warning: Choice conflict involving two expansions at
line 824, column 5 and line 825, column 5 respectively.
A common prefix is: "ifElse" "("
Consider using a lookahead of 3 or more for earlier expansion.
which - as far as I can tell - implies that my grammer (regarding ifelse) is ambiguous.
If there is no way to write it unambiguously, how could the suggested lookahead look like?
Thanks for your feedback in advance!

No fixed amount of lookahead could possibly resolve this ambiguity in all cases. You could have an arbitrarily long stream of tokens that form a valid expression_arith_result - but is then followed by a comparison operator and another arithmetic value, thus turning it into an expression_bool_result.
The solution would be to have a single ifElse statement, that takes two arbitrary expressions. The required agreement in type between the two expressions would be a matter of semantics, not grammar.

Jason's answer is correct in that you can't resolve the choice with a fixed length of lookahead. However JavaCC does not limit you to fixed length of lookahead. So you can do the following.
void IfExpression() :
{ }
{ LOOKAHEAD( <IFELSE> "(" Condition() "," BooleanExpression() )
BooleanIfExpression()
|
ArithmeticIfExpression()
}

Related

How can I write an unambiguous nearley grammar for boolean search operators

The Context
I am climbing the Nearley learning curve and trying to write a grammar for a search query parser.
The Goal
I would like to write grammar that is able to parse a querystring that contains boolean operators (e.g. AND, OR, NOT). Lets use AND for this question as a trivial case.
For instance, the grammar should recognize these example strings as valid:
pants
pants AND socks
jumping jacks
The Attempt
My naive attempt looks something like this:
query ->
statement
| statement "AND" statement
statement -> .:+
The Problem
The above grammar attempt is ambiguous because .:+ will match literally any string.
What I really want is for the first condition to match any string that does not contain AND in it. Once "AND" appears I want to enter the second condition only.
The Question
How can I detect these two distinct cases without having ambiguous grammar?
I am worried I'm missing something fundamental; I can imagine a ton of use cases where we want arbitrary text split up by known operators.
Yeah, if you've got an escape hatch that could be literally anything, you're going to have a problem.
Somewhere you're going to want to define what your base set of tokens are, at least something like \S+ and then how those tokens can be composed.
The place I'd typically start for a parser is trying to figure out where recursion is accounted for in the parser, and what approach to parsing the lib you're relying on takes.
Looks like Nearley is an Earley parser, and as the wikipedia entry for them notes, they're efficient for left-recursion.
This is just hazarding a guess, but something like this might get you to conjunction at least.
CONJUNCTION -> AND | OR
STATEMENT -> TOKENS | (TOKENS CONJUNCTION STATEMENT)
TOKENS -> [^()]+
A structure like this should be unambiguous and bans parentheses in tokens, unless they're surrounded by double quotes.

Why is it valid to apply a predicate to a string in an xpath?

I couldn't find any answers by googling. A coworker mixed up css and xpath predicates and produced the following xpath:
$x("//*[contains(#class, 'btn'[someattr='somevalue'])]")
This xpath returns all elements in the document.
My best understanding is that it is apparently possible to apply predicates to strings ('btn') and that the predicate does not match the string, thus causing the second argument to "contains" to be empty. This would then match all elements because their class attributes contain at least nothing.
But why is this legal?
And is there a way to construct a predicate that would match the string value and thus make the second parameter not empty? I tried a couple things like 'btn'[text()='btn'] but got the same result as the earlier expression.
This is legal in XPath 2.0 but not in 1.0.
In XPath 1.0, the grammar permits a predicate to be applied to any expression, but the type rules say that it's an error if the expression evaluates to anything other than a node-set. (This is a common approach in many languages, to keep the syntax orthogonal but define rules in the type system about which operators can be applied to which values).
In XPath 2.0, all values are sequences, and any sequence can be filtered by a predicate, so it makes perfect sense to filter a sequence of strings, and since a single string is itself a sequence of strings, there is no reason to prohibit that case. In fact it is useful, and I use it frequently: consider:
print("Found " || $n || " error" || "s"[$n gt 1])
Your example would fail even in XPath 2.0, however, because the lhs of the predicate someattr='somevalue' means child::someattr and you can't use an axis expression when the context item is a string.

DFA diagram to recognize arithmetic expressions

I need to draw a DFA diagram that can recognize arithmetic expressions, varialbes or brackets are not allowed. It can only contain numbers and four arithmetic operators.
And it has to accepts any number string with or without sign - e.g. 5, -7 , +15.
And the numbers strings can be mixed with arithmetic operators - e.g. 3+5 , -1+7*3.
I don't know if my diagram actually performs this requirements.
No, your diagram is not good enough yet: it allows +/-/+++ as an expression.
I'd start formulating this using something like EBNF first, then turn that into a regular expression (essentially inlining the non-terminals), and then build a DFA from that.
What is a number? Only integers, or can you have a decimal point? If decimal points are fine, do you need digits before and after, or just one of them? Do you allow scientific notation with "e" in it? May there be leding zeros in numbers, or only if the whole integer part is zero? And what about signs? Do you allow more than one? Do you allow an unary plus or minus sign after any arithmetic operator?
Depending on your answers above, the EBNF might look somewhat like this:
digits = digit { digit }
number = [ "+" | "-" ] ( digits [ "." [ digits ] ] | "." digits )
operator = "+" | "-" | "*" | "/"
expression = number { operator number }
Turning these things to regular expressions:
digits : [0-9]+
number : [+\-]?([0-9]+(\.[0-9]*)?|\.[0-9]+)
operator : [+\-*/]
expression: [+\-]?([0-9]+(\.[0-9]*)?|\.[0-9]+)([+\-*/][+\-]?([0-9]+(\.[0-9]*)?|\.[0-9]+))*
Now that final regexp is what you can use to build a DFA from. May it have ε transitions or do you need inputs on all edges? Must it be deterministic? Depending on these answers, there may be more or less work ahead of you.
You don't have to mechanically turn the regexp ino an automaton; with a bit of human intuition you can build a simpler automaton. But make sure that you see all the considerations which are captured in the building of these expressions reflected in your automaton. Avoid allowing more than one decimal point. Avoid chaining operators. Make sure every number has at least one digit. Things like this.

How to match a minimum number of regex groups or assertions?

OK regex nerds!
I am using regex lookahead assertions for password validation that is similar to the pattern described here:
\A(?=\w{6,10}\z)(?=[^a-z]*[a-z])(?=(?:[^A-Z]*[A-Z]){3})(?=\D*\d)
However, we want to only require that any 3 of the 4 assertions be valid - not necessarily all of them. Any thoughts on how this could be done?
To shorten any kind of pattern, factorize:
\A(?:
(?=\w{6,10}\z) (?=.*[a-z]) (?: (?:.*[A-Z]){3} | .*\d )
|
(?=.*\d) (?=(?:.*[A-Z]){3}) (?: .*[a-z] | \w{6,10}\z )
)
Note that you don't need a lookahead to test the last condition.
demo
Other way, where each condition is optional and that uses a named group to count (.net only):
\A
(?<c>(?=\w{6,10}\z))?
(?<c>(?=[^a-z]*[a-z]))?
(?<c>(?=(?:[^A-Z]*[A-Z]){3}))?
(?<c>(?=\D*\d))?
(?<-c>){3} # decrement c 3 times
(?(c)|(?!$)) # conditional: force the pattern to fail if too few conditions succeed.
demo
There's no "easy" way to do this in a single regular expression. The only way would be to define all possible permutations of the "three out of four" assertions - e.g.
\A(?=\w{6,10}\z)(?=[^a-z]*[a-z])(?=(?:[^A-Z]*[A-Z]){3})| # Maybe no digit
\A(?=[^a-z]*[a-z])(?=(?:[^A-Z]*[A-Z]){3})(?=\D*\d)| # Maybe wrong length
\A(?=\w{6,10}\z)(?=(?:[^A-Z]*[A-Z]){3})(?=\D*\d)| # Maybe no lower
\A(?=\w{6,10}\z)(?=[^a-z]*[a-z])(?=\D*\d) # Maybe not enough uppers
However, this mind-melting regex is clearly not a good solution.
A better approach would be to perform the four checks separately (with regex or otherwise), and count that there is at least three passed conditions.
...However, let's take a step back here and ask: Why are you doing this?? You're implementing a password entropy check. Based on your fuzzy rules, the following passwords are valid:
AAAa1
password1
LETmein
And the following passwords are invalid:
reallylongsecurepassword8374235359232
HorseBatteryStapleCorrect
I would strongly advise against such a bizarrely restrictive policy.
Brief
The easiest method would be to have separate regular expressions and check whether 3/4 of them are successful in your code's language. The only way to do this in regex is to present all cases. That being said, this is probably the easiest method (in regex) to present all options as it allows you to edit the patterns in one location (where they are defined) rather than multiple times (more prone to bugs). The DEFINE constructs in regex are seldom supported, but PCRE regex does.
You can also have your code generate each regex permutation. See this question about generating all permutations of a list in python
I don't know why you want to do this for passwords, it's considered malpractice, but, since you're asking for it, I figured I'd give you the easiest solution possible in regex... You really should only check minimum length (and complexity if you want [based on algorithms] to show the user how secure your system finds their password to be).
Code
(?(DEFINE)
(?<w>(?=\w{6,10}\z))
(?<l>(?=[^a-z]*[a-z]))
(?<u>(?=(?:[^A-Z]*[A-Z]){3}))
(?<d>(?=\D*\d))
)
\A(?:
(?&w)(?&l)(?&u)|
(?&w)(?&l)(?&d)|
(?&w)(?&u)(?&d)|
(?&l)(?&u)(?&d)
)
Note: The regex above uses the x modifier (ignore whitespace) so that we can nicely organize the content.

Creating own substring functions recursively in Ocaml

How can i write a substring function in ocaml without using any assignments lists and iterations, only recursions? i can only use string.length.
i tried so far is
let substring s s2 start stop=
if(start < stop) then
substring s s2 (start+1) stop
else s2;;
but obviously it is wrong, problem is that how can i pass the string that is being built gradually with recursive calls?
This feels like a homework problem that is intended to teach you think think about recursion. For me it would be easier to think about the recursion part if you decide on the basic operations you're going to use. You can't use assignments, lists, or iterations, okay. You need to extract parts of your input string somehow, but you obviously can't use the built-in substring function to do this, that would defeat the purpose of the exercise. The only other operation I can think of is the one that extracts a single character from a string:
# "abcd".[2];;
- : char = 'c'
You also need a way to add a character to a string, giving a longer string. But you're not allowed to use assignment to do this. It seems to me you're going to have to use String.make to translate your character to a string:
# String.make 1 'a';;
- : string = "a"
Then you can concatenate two strings using the ^ operator:
# "abc" ^ "def"
- : string = "abcdef"
Are you allowed to use these three operations? If so, you can start thinking about the recursion part of the substring problem. If not, then I probably don't understand the problem well enough yet to give advice. (Or maybe whoever set up the restrictions didn't expect you to have to calculate substrings? Usually the restrictions are also a kind of hint as to how you should proceed.)
Moving on to your specific question. In beginning FP programming, you don't generally want to pass the answer down to recursive calls. You want to pass a smaller problem down to the recursive call, and get the answer back from it. For the substring problem, an example of a smaller problem is to ask for the substring that starts one character further along in the containing string, and that is one character shorter.
(Later on, you might want to pass partial answers down to your recursive calls in order to get tail-recursive behavior. I say don't worry about it for now.)
Now I can't give you the answer to this, Partly because it's your homework, and partly because it's been 3 years since I've touched OCaml syntax, but I could try to help you along.
Now the Basic principle behind recursion is to break a problem down into smaller versions of itself.
You don't pass the string that is slowly being built up, instead use your recursive function to generate a string that is almost built up except for a single character, and then you add that character to the end of the string.

Resources