EBNF to BNF convert for the expression of <<'while'>> ::= while <'Z'> do begin <'A'> { <'A'> } end - bnf

I'd like to convert EBNF to BNF.
<<'while'>> ::= while <'Z'> do begin <'A'> { <'A'> } end ;
How can I do this?

Related

How to solve this grammar recursion?

I found this grammar for a calculator:
<Expression> ::= <ExpressionGroup> | <BinaryExpression> | <UnaryExpression> | <LiteralExpression>
<ExpressionGroup> ::= '(' <Expression> ')'
<BinaryExpression> ::= <Expression> <BinaryOperator> <Expression>
<UnaryExpression> ::= <UnaryOperator> <Expression>
<LiteralExpression> ::= <RealLiteral> | <IntegerLiteral>
<BinaryOperator> ::= '+' | '-' | '/' | '*'
<UnaryOperator> ::= '+' | '-'
<RealLiteral> ::= <IntegerLiteral> '.' | <IntegerLiteral> '.' <IntegerLiteral>
<IntegerLiteral> ::= <Digit> <IntegerLiteral> | <Digit>
<Digit> ::= '0' | '1' |'2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
Source: here
It looks great. So I wrote the lexer and started the parser. Now there is an infinite recursion that I can't solve between Expression and BinaryExpression.
My code for expression:
boolean isExpression() {
if (isExpressionGroup() || isBinaryExpression() || isUnaryExpression() || isLiteralExpression()) {
println("Expression!");
return true;
}
println("Not expression.");
return false;
}
And for binary expression:
boolean isBinaryExpression() {
if (isExpression()) {
peek(1);
if (currentLex.token == Token.BINARY_OPERATOR) {
peek(2);
if (isExpression()) {
peek(3);
println("Binary expression!");
return true;
} else peek(0);
} else peek(0);
} else peek(0);
return false;
}
So peek(int) is just a function for looking forward without consuming any lexemes. So my problem: My input is '2*3' . isExpression() gets called. isExpressionGroup() fails, because there is no '('. Then the isBinaryExpression() gets called, which calls isExpression(). isExpressionGroup() fails again, and isBinaryExpression() gets called again. And so on, until a stack overflow.
I know, there is ANTLR and JavaCC (and other tools), but I would like to do it without them.
Could anyone give a hand?
Dealing with left recursion in a hand-crafted top-descent parser is not easy. Parser generators that solve the problem have years of work in them. There are theoretical reasons for that.
The best solution if you don't want to use a tool is to eliminate the left recursion. The problem if you do it "by the book" is that you'll get an ugly grammar and an ugly parser that will be difficult to use.
But there's another solution. You can add enough rules to represent the precedence hierarchy of the operators, which is something you'd have to do anyway, unless you want to risk a a+b*c be parsed as (a+b)*c.
There are plenty of examples of non left-recursive grammars for expressions on the Web, and here in SO in particular. I suggest you take one of them, and start from there.

Fitting order in bnf converter

I have problem with rules priority in bnf converter. Here I copy some rules
CParams. CallParams ::= [CallParam] ;
separator CallParam "," ;
VarCParam. CallParam ::= Ident ;
ExpCParam. CallParam ::= Exp ;
BExpCParam. CallParam ::= BExp ;
[...]
EVar. Exp3 ::= Ident ;
[...]
BVar. BExp2 ::= Ident ;
I write an example program:
void p(int a) {
a = a+7;
print a;
}
main() {
int i;
p(i);
}
As a result I expect that p(i) will be translated to CParams [VarCParam (Ident "i")], but it is converted to CParams [BExpCParam (BVar (Ident "i"))].
Could you tell how to change the rules in order to fix this bug
There is a conflict in your grammar: both trees are possible. happy just choose one way but probably printed something like this during compilation:
reduce/reduce conflicts: 2
To fix it you have to remove one of those rules:
VarCParam. CallParam ::= Ident ;
BExpCParam. CallParam ::= BExp ;
BVar. BExp2 ::= Ident ;

How to convert BNF to EBNF

How can I convert this BNF to EBNF?
<vardec> ::= var <vardeclist>;
<vardeclist> ::= <varandtype> {;<varandtype>}
<varandtype> ::= <ident> {,<ident>} : <typespec>
<ident> ::= <letter> {<idchar>}
<idchar> ::= <letter> | <digit> | _
EBNF or Extended Backus-Naur Form is ISO 14977:1996, and is available in PDF from ISO for free*. It is not widely used by the computer language standards. There's also a paper that describes it, and that paper contains this table summarizing EBNF notation.
Table 1: Extended BNF
Extended BNF Operator Meaning
-------------------------------------------------------------
unquoted words Non-terminal symbol
" ... " Terminal symbol
' ... ' Terminal symbol
( ... ) Brackets
[ ... ] Optional symbols
{ ... } Symbols repeated zero or more times
{ ... }- Symbols repeated one or more times†
= in Defining symbol
; post Rule terminator
| in Alternative
, in Concatenation
- in Except
* in Occurrences of
(* ... *) Comment
? ... ? Special sequence
The * operator is used with a preceding (unsigned) integer number; it does not seem to allow for variable numbers of repetitions — such as 1-15 characters after an initial character to make identifiers up to 16 characters long. This lis
In the standard, open parenthesis ( is called start group symbol and close parenthesis ) is called end group symbol; open square bracket [ is start option symbol and close square bracket is end option symbol; open brace { is start repeat symbol and close brace } is end repeat symbol. Single quotes ' are called first quote symbol and double quotes " are second quote symbol.
* Yes, free — even though you can also pay 74 CHF for it if you wish. Look at the Note under the box containing the chargeable items.
The question seeks to convert this 'BNF' into EBNF:
<vardec> ::= var <vardeclist>;
<vardeclist> ::= <varandtype> {;<varandtype>}
<varandtype> ::= <ident> {,<ident>} : <typespec>
<ident> ::= <letter> {<idchar>}
<idchar> ::= <letter> | <digit> | _
The BNF is not formally defined, so we have to make some (easy) guesses as to what it means. The translation is routine (it could be mechanical if the BNF is formally defined):
vardec = 'var', vardeclist, ';';
vardeclist = varandtype, { ';', varandtype };
varandtype = ident, { ',', ident }, ':', typespec;
ident = letter, { idchar };
idchar = letter | digit | '_';
The angle brackets have to be removed around non-terminals; the definition symbol ::= is replaced by =; the terminals such as ; and _ are enclosed in quotes; concatenation is explicitly marked with ,; and each rule is ended with ;. The grouping and alternative operations in the original happen to coincide with the standard notation. Note that explicit concatenation with the comma means that multi-word non-terminals are unambiguous.
† Casual study of the standard itself suggests that the {...}- notation is not part of the standard, just of the paper. However, as jmmut notes in a comment, the standard does define the meaning of {…}-:
§5.8 Syntactic term
…
When a syntactic-term is a syntactic-factor followed by
an except-symbol followed by a syntactic-exception it
represents any sequence of symbols that satisfies both of
the conditions:
a) it is a sequence of symbols represented by the syntactic-factor,
b) it is not a sequence of symbols represented by the
syntactic-exception.
…
NOTE - { "A" } - represents a sequence of one or more A's because it is a syntactic-term with an empty syntactic-exception.
Remove the angle brackets and put all terminals into quotes:
vardec ::= "var" vardeclist;
vardeclist ::= varandtype { ";" varandtype }
varandtype ::= ident { "," ident } ":" typespec
ident ::= letter { idchar }
idchar ::= letter | digit | "_"

Antlr and left-recursive rules

I am trying to write a grammar using ANTLR, but I can't understand how antlr works with recursive choices.
I read lots of articles and forums, but can't solve my problem...
Here is a small part of my grammar:
grammar MyGrammar;
ComponentRef :
IDENT ('[' Expression (',' Expression)* ']')?
;
Expression:
ComponentRef ('(' FunctionArguments ')')?
;
FunctionArguments:
Expression (',' Expression)*
;
IDENT: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
I still don't understand why it doesn't work... there is no ambiguity! Isn't it?
Here is some examples of code my grammar should work with :
a
a[b,c]
a[b[c], d]
func(a)
func(a,b,c)
func[a](b,c)
func(a[b], c[d])
func[a](b[c])
Thank you by advance!
First, be sure to understand lexer and parser rules. Also read the ANTLR Mega Tutorial.
The code only uses lexer rules, which won't work. Although even lexer rules can be recursive (in ANTLR grammars), they are best avoided. Rather, most rules should be parser rules:
componentRef :
IDENT ('[' expression (',' expression)* ']')?
;
expression:
componentRef ('(' functionArguments ')')?
;
functionArguments:
expression (',' expression)*
;
IDENT: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
The grammar above won't recognize the input you've posted, but there are no error anymore. A grammar that recognizes the input you've posted, could look like this (untested!) grammar:
parse
: expr* EOF
;
expr
: IDENT (index | call)*
;
index
: '[' expr_list ']'
;
call
: '(' expr_list ')'
;
expr_list
: expr (',' expr)*
;
IDENT
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
SPACE
: (' ' | '\t' | '\r' | '\n')+ {skip();}
;
I'm assuming your capitol Expressions is an error. You probably meant to type lowercase.
How can you say there's no ambiguity? expressions call functionArguments, functionArguments calls expressions. -1

Ambiguous grammar with Lemon Parser Generator

So basically I want to parsed structure CSS code in PHP, using a lexer/parser generated by the PEAR packages PHP_LexerGenerator and PHP_ParserGenerator. My goal is to parse files like this:
selector, selector2 {
prop: value;
prop2 /*comment */ :
value;
subselector {
prop: value;
subsub { prop: value; }
}
}
This is all fine as long as I don't have pseudo classes. Pseudoclasses allow it, to add : and a CSS name ([a-z][a-z0-9]*) to an element, like in a.menu:visited. Being somewhat lazy, the parser has no list of valid pseudo classes and accepts everything for the class name.
My grammar (ignoring all the special cases and whitespace) looks like this:
document ::= (<rule>)*
rule ::= <selector> '{' (<content>)* '}'
content ::= <rule>
content ::= <definition>
definition ::= <name> ':' <name> ';'
// h1 .class.class2#id :visited
<selector> ::= <name> (('.'|'#') <name>)* (':' <name>)?
Now, when I try to parse the following
h1 {
test:visited {
simple: case;
}
}
The parser complains, that it expected a <name> to follow the double colon. So it tries to read the simple: as a <selector> (just look at the syntax highlighting of SO).
Is it my error that the parser can not backtrace enough to try the <definition> rule? Or is Lemon just not powerful enough to express this? If so, what can I do to get a parser working with this grammar?
Your question asks about PHP_ParserGenerator and PHP_LexerGenerator. The parser generator code is marked as 'not maintained', which bodes ill.
The syntax you are using for the grammar is not acceptable for Lemon, so you need to clarify why you think the parser generator should accept it. You mention a problem with 'expected a <name> to follow the double colon, but neither your grammar nor your sample input has a double colon, which makes it hard to help you.
I think this Lemon grammar is equivalent to the one you showed:
document ::= rule_list.
rule_list ::= .
rule_list ::= rule_list rule.
rule ::= selector LBRACE content_list RBRACE.
content_list ::= .
content_list ::= content_list content.
content ::= rule.
content ::= definition.
definition ::= NAME COLON NAME SEMICOLON.
selector ::= NAME opt_dothashlist opt_colonname.
opt_dothashlist ::= .
opt_dothashlist ::= dot_or_hash NAME.
dot_or_hash ::= DOT.
dot_or_hash ::= HASH.
opt_colonname ::= COLON NAME.
However, when it is compiled, Lemon complains 1 parsing conflicts and the output file shows:
State 2:
definition ::= NAME * COLON NAME SEMICOLON
selector ::= NAME * opt_dothashlist opt_colonname
(10) opt_dothashlist ::= *
opt_dothashlist ::= * dot_or_hash NAME
dot_or_hash ::= * DOT
dot_or_hash ::= * HASH
COLON shift 10
COLON reduce 10 ** Parsing conflict **
DOT shift 13
HASH shift 12
opt_dothashlist shift 5
dot_or_hash shift 7
This means it is not sure what to do with a colon; it might be the 'opt_colonname' part of a 'selector' or it might be part of a 'definition':
name1:name4 : name2:name3 ;
Did you mean to allow syntax such as that? Nominally, according to the grammar, that should be valid, but
name1:name4;
should also be valid. I think it requires 2 or 3 lookahead tokens to disambiguate these (so your grammar is not LALR(1) but LALR(3)).
Review your definition of 'selector' in particular.

Resources