Removing left recursion - bnf

I am feeling quite nostalgic so I've decided to write an adventure game creator that allows complex sentences to be entered by the user.
I have hand rolled a lexer and parser that uses the visitor pattern and it works quite well, until I encountered a left-recursion issue with one of my BNF (Backus-Naur Form) rules:
object ::= {adjective} noun
| object AND {adjective} noun
After removing the left-recursion by following this wiki entry does this look right?
object ::= {adjective} noun object2
object2 ::= AND {adjective noun}
| !Empty
Edited:
I am hand rolling the lexer and parser using C# by following the guide given here. I an not using any parser generators for this exercise.
Also, I got the BNF rules for the parser from this website.

Is there any reason to prefer left-recursion over right recursion in this case? Wouldn't this be simpler:
object ::= {adjective} noun |
{adjective} noun AND object
If you really want to make your solution work, you need to make object2 right-recursive:
object ::= {adjective} noun object2
object2 ::= AND {adjective noun} object2 |
!Empty

Usually when I have this rule (yours):
object ::= {adjective} noun |
object AND {adjective} noun
I apply the following transformations:
Stage 1 (still left recursion) -
object ::= ad_noun |
object AND ad_noun
ad_noun ::= {adjective} noun
Stage 2 (change to right recursion) -
object ::= ad_noun |
ad_noun AND object
ad_noun ::= {adjective} noun

Related

Backus–Naur form with boolean algebra. Problem with brackets and parse tree

boolean algebra
I want to write those boolean expression in Backus-Naur-Form.
What I have got is:
< variable > ::= < signal > | < operator> | < bracket >< variable>
< signal> ::= <p> | <q> | <r>| <s>
< operator> ::= <AND> | <OR> | <implication>| <equivalence>| <NOT>
< bracket> ::= < ( > | < ) >
I made a rekursion with < bracket >< variable>, so that whenever there is a bracket it starts a new instance, but I still do not know when to close the brackets. With this you are able to set a closing bracket and make a new instance, but I only want that for opening brackets.
Can I seperate < bracket> in < open bracket> and < closing bracket>?
Is my Backus-Naur form even correct? There isn't much information about Backus-Naur form with boolean algebra on the internet. How does the parse tree of this look like?
I assume that you want to define a grammar for boolean expressions using the Backus-Naur form, and that your examples are concrete instances of such expressions. There are multiple problems with your grammar:
First of all, you want your grammar to only generate correct boolean expressions. With your grammar, you could generate a simple operator ∨ as valid expression using the path <variable> -> <operator> -> <OR>, which is clearly wrong since the operator is missing its operands. In other words, ∨ on its own cannot be a correct boolean expression. Various other incorrect expressions can be derived with your grammar. For the same reason, the opening and closing brackets should appear together somewhere within a production rule, since you want to ensure that every opening bracket has a closing bracket. Putting them in separate production rules might destroy that guarantee, depending on the overall structure of your grammar.
Secondly, you want to differentiate between non-terminal symbols (the ones that are refined by production rules, i.e. the ones written between < and >) and terminal symbols (atomic symbols like your variables p, q, r and s). Hence, your non-terminal symbols <p>, <q>, <r> and <s> should be terminal symbols p, q, r and s. Same goes for other symbols like brackets and operators.
Thirdly, in order to get an unambiguous parse tree, you want to get your precedence and associativity of your operators correct, i.e., you want to make sure that, for example, negation is evaluated before implication, since it has a higher precedence (similar to arithmetic expressions where multiplication must be evaluated before addition). In other words, we want operators with higher precedence to appear closer to the leaf nodes of the parse tree, and operators with lower precedence to appear closer to the root node of the tree, since the leaves of the tree are evaluated first. We can achieve that by defining our grammar in a way that reflects the precedences of the operators in a decreasing manner:
<expression> ::= <expression> ↔ <implication> | <implication>
<implication> ::= <implication> → <disjunction> | <disjunction>
<disjunction> ::= <disjunction> ∨ <conjunction> | <conjunction>
<conjunction> ::= <conjunction> ∧ <negation> | <negation>
<negation> ::= ¬ <negation> | <variable> | ( <expression> )
<variable> ::= p | q | r | s
Starting with <expression>, we can see that a valid boolean expression starts with chaining all the ↔ operators together, then all the → operators , then all the ∨ operators, and so on, according to their precedence. Hence, operators with lower precedence (e.g., ↔) are located near the root of the tree, where operators with higher precedence (e.g., ¬) are located near the leaves of the tree.
Note that the grammar above is left-recursive, which might cause some problems with software tools that cannot handle them (e.g., parser generators).

Parser combinator for propositional logic

I'd like to design a combinator to parse Propositional Logic. Here's a simple BNF:
<sentence> ::= <atomic-sentence> | <complex-sentence>
<atomic-sentence> ::= True | False | P | Q | R
<complex-sentence> ::= (<sentence>)
| <sentence> <connective> <sentence>
| ¬<sentence>
<connective> ::= ∧ | ∨ | ⇒ | ⇔
The problem is that the grammar is left-recursive, which leads to an infinite loop: a sentence can be a complex sentence, which can start with a sentence, which can be a complex sentence, ... forever. Here's an example sentence that causes this problem:
P∧Q
Is there a simple way to fix the grammar so that it is suitable for a parser combinator? Thanks.
FWIW, I'm using FParsec in F#, but I think any parser combinator library would have the same issue.
FParsec can handle infix operators using the OperatorPrecedenceParser class where you just need to specify which operators with which associativity and precedence you have, without having to actually write a grammar for your infix expressions. The rest of this answer will explain how to solve the problem without this class for cases where the class doesn't apply, for parser combinators that don't have an equivalent class or in case you just plain don't want to use it or are at least interested in how you'd solve the problem without it.
Parser combinators tend to not support left-recursion, but they do tend to support repetition. Luckily, a left-recursive rule of the form <a> ::= <a> <b> | <c> can be rewritten using the * repetition operator to <a> ::= <c> <b>*. If you then left-fold over the resulting list, you can construct a tree that looks just like the parse tree you'd have gotten from the original grammar.
So if we first inline <complex-sentence> into <sentence> and then apply the above pattern, we get <a> = <sentence>, <b> = <connective> <sentence> and <c> = <atomic-sentence> | '(' <sentence> ')' | ¬<sentence>, resulting in the following rule after the transformation:
<sentence> ::= ( <atomic-sentence>
| '(' <sentence> ')'
| ¬<sentence>
)* <connective> <sentence>
To improve readability, we'll put the parenthesized part into its own rule:
<operand> ::= <atomic-sentence>
| '(' <sentence ')'
| ¬<sentence>
<sentence> ::= <operand> (<connective> <sentence>)*
Now if you try this grammar, you'll notice something strange: the list created by the * will only ever contain a single element (or none). This because if there's more than two operands, the right-recursive call to <sentence> will eat up all the operands, creating a right-associative parse tree.
So really the above grammar is equivalent to this (or rather the grammar is ambiguous, but a parser combinator will treat it as if it were equivalent to this):
<sentence> ::= <operand> <connective> <sentence>
This happened because the original grammar was ambiguous. The ambiguous definition <s> ::= <s> <c> <s> | <o> can either be interpreted as the left-recursive <s> ::= <s> <c> <o> | <o> (which will create a left-associative parse tree) or the right-recursive <s> ::= <o> <c> <s> | <o> (right-associative parse tree). So we should first remove the ambiguity by choosing one of those forms and then apply the transformation if applicable.
So if we choose the left-recursive form, we end up with:
<sentence> ::= <operand> (<connective> <operand>)*
Which will indeed create lists with more than one element. Alternatively if we choose the right-recursive rule, we can just leave it as-is (no repetition operator necessary) as there is no left-recursion to eliminate.
As I said, we can now get a left-associative tree by taking the list from the left-recursive version and left-folding it or a right-associative one by taking the right-recursive version. However both of these options will leave us with a tree that treats all of the operators as having the same precedence.
To fix the precedence you can either apply something like the shunting yard algorithm to the list or you can first re-write the grammar to take precedence into account and then apply the transformation.

Common Lisp: A good way to represent grammar rules?

This is a Common Lisp data representation question.
What is a good way to represent grammars? By "good" I mean a representation that is simple, easy to understand, and I can operate on the representation without a lot of fuss. The representation doesn't have to be particularly efficient; the other properties (simple, understandable, process-able) are more important to me.
Here is a sample grammar:
Session → Facts Question
Session → ( Session ) Session
Facts → Fact Facts
Facts → ε
Fact → ! STRING
Question → ? STRING
The representation should allow the code that operates on the representation to readily distinguish between terminal symbols and non-terminal symbols.
Non-terminal symbols: Session, Facts, Fact, Question
Terminal symbols: (, ), ε, !, ?
This particular grammar uses parentheses symbols, which conflicts with Common Lisp's use of parentheses symbols. What's a good way to handle that?
I want my code to be able to be able to recognize the symbol for the empty string, ε. What's a good way to represent the symbol for the empty string, ε?
I want my code to be able to distinguish between the left-hand side and the right-hand side of a grammar rule.
Below are some common operations that I want to perform on the representation.
Consider this rule:
A → u1u2...un
Operations: I want to get the first symbol of a grammar rule's right-hand side. Then I want to know: is it a terminal symbol? Is it the ε-symbol? If it's a non-terminal symbol, then I want to get its grammar rule.
GRAIL (GRAmmar In Lisp)
Description of GRAIL
Slightly modified version of GRAIL with a function generator included
I'm including the BNF of GRAIL from the second link in case it expires:
<grail-list> ::= "'(" {<grail-rule>} ")"
<grail-rule> ::= <assignment> | <alternation>
<assignment> ::= "(" <type> " ::= " <s-exp> ")"
<alternation> ::= "(" <type> " ::= " <type> {<type>} ")"
<s-exp> ::= <symbol> | <nonterminal> | "(" {<s-exp>} ")"
<type> ::= "#(" <type-name> ")"
<nonterminal> ::= "#(" {<arg-name> " "} <type-name> ")"
<type-name> ::= <symbol>
<arg-name> ::= <symbol>
DCG Format (Definite Clause Grammar)
There is an implementation of a definite clause grammar in Paradigms of Artificial Intelligence Programming. Technically it's Prolog, but it's all implemented as Lisp in the book.
Grammar of English in DCG Format as used in PAIP
DCG Parser
Hope this helps!

How to correctly translate BNF to GoldParser?

Say I have this in BNF:
a ::= b {c}
| d {e}
Is there any way to translate to Gold-Parser? Without breaking it up like this:
<a> ::= <b> <c>
<c> ::=
| <c> terminal
Side Note: If anybody has a better title/more tags, please edit it, thanks!
Is there any way to translate to Gold-Parser? Without breaking it up
No, it doesn't support the repetition operator ({x}) as part of rule definitions, so you must encode it with multiple rules.
See also Converting EBNF to BNF

Converting EBNF to BNF basics

I'm not quite sure how to answer a question for my computer languages class. I am to convert the following statement from EBNF form to BNF form:
EBNF: expr --> [-] term {+ term}
I understand that expressions included within curly braces are to be repeated zero or more times, and that things included within right angle braces represents zero or one options. If my understanding is correct, would this be a correct conversion?
My BNF:
expr --> expr - term
| expr + term
| term
Bonus Reading
Converting EBNF to BNF (general rules)
I don't think that's correct. In fact, I don't think the EBNF is actually valid EBNF. The answer to the question How to convert BNF to EBNF shows how valid EBNF is constructed, quoting from ISO/IEC 14977:1996, the Extended Backus-Naur Form standard.
I think the expression:
expr --> [-] term {+ term}
should be written:
expr = [ '-' ] term { '+', term };
This means that an expression consists of an optional minus sign, followed by a term, followed by a sequence of zero of more occurrences of a plus sign and a term.
Next question: which dialect of BNF are you targeting? Things get tricky here; there are many dialects. However, here's one possible translation:
<expr> ::= [ MINUS ] <term> <opt_add_term_list>
<opt_add_term_list> ::= /* Nothing */
| <opt_add_term_list> <opt_add_term>
<add_term> ::= PLUS term
Where MINUS and PLUS are terminals (for '-' and '+'). This is a very austere but minimal BNF. Another possible translation would be:
<expr> ::= [ MINUS ] <term> { PLUS <term> }*
Where the { ... }* part means zero or more of the contained pattern ... (so PLUS <term> in this example). Or you could use quoted characters:
<expr> ::= [ '-' ] <term> { '+' <term> }*
And so the list of possible alternatives goes on. You'll have to look at the definition of BNF you were given to work to, and you should complain about the very sloppy EBNF you were given, if it was meant to be ISO standard EBNF. If it was just some random BNF-style language called EBNF, I guess it is just the name that is confusing. Private dialects are fine as long as they're defined, but it isn't possible for people not privy to the dialect to know what the correct answer is.

Resources