I am writing a parser for a set of CFG.
(Note: The RHS can ONLY be an uppercase letter)
/*ignore declaration and stuff, here's the main part of the code */
void
start():
{
}
{
(
<UPPER_CHAR>
<ARROW>
<STRING>
( <PIPE> <STRING> )*
)*
}
TOKEN:
{
<ARROW: "=>" >
|
<PIPE: "|">
|
<UPPER_CHAR: (["A"-"Z"])>
}
TOKEN: {<STRING: (<LETTER> | <DIGIT> | <SYMBOL>)+ > }
This obviously missed some edge cases, some which include:
A => A | a | D E => e
So what did I do wrong?
I guess SYMBOL includes "=" and ">" but not "|". In that case. STRING will match the whole of " D E => e".
Why do you want STRING at all? Why not do something like this.
void start() : {} {
(
<UPPER_CHAR> <ARROW>
choices()
)*
}
void choices() : {} {
choice() ( <PIPE> choice())*
}
void choice() : {} {
LOOKAHEAD(<UPPER_CHAR> <ARROW> )
{}
|
(<UPPER_CHAR> | <LOWER_CHAR>) choice()
|
{}
}
The reason I used recursion for choice is that there is no way to use syntactic lookahead to exit a loop. I.e. what you want is (<UPPER_CHAR> | <LOWER_CHAR>)*, but you want to get out of this loop as soon as the next two tokens are <UPPER_CHAR> <ARROW>.
Related
I am new to javacc. I am trying to define a token which can match any string. I am following the regex syntax <ANY: (~[])+> which is not working. I want to achieve something very simple, define an expression having the following BNF:
<exp> ::= "path(" <string> "," <number> ")"
My current .jj file is as follows, any help on how I can parse the string:
options
{
}
PARSER_BEGIN(SimpleAdd)
package SimpleAddTest;
public class SimpleAdd
{
}
PARSER_END(SimpleAdd)
SKIP :
{
" "
| "\r"
| "\t"
| "\n"
}
TOKEN:
{
< NUMBER: (["0"-"9"])+ > |
<PATH: "path"> |
<RPAR: "("> |
<LPAR: ")"> |
<QUOTE: "'"> |
<COMMA: ","> |
<ANY: (~[])+>
}
int expr():
{
String leftValue ;
int rightValue ;
}
{
<PATH> <RPAR> <QUOTE> leftValue = str() <QUOTE> <COMMA> rightValue = num() <LPAR>
{ return 0; }
}
String str():
{
Token t;
}
{
t = <ANY> { return t.toString(); }
}
int num():
{
Token t;
}
{
t = <NUMBER> { return Integer.parseInt(t.toString()); }
}
The error I am getting with the above javacc file is:
Exception in thread "main" SimpleAddTest.ParseException: Encountered " <ANY> "path(\'5\',1) "" at line 1, column 1.
Was expecting:
"path" ...
The pattern <ANY: (~[])+> will indeed match any nonempty string. The issue is that this is not what you really want. If you have a rule <ANY: (~[])+>, it will match the whole file, unless the file is empty. In most cases, because of the longest match rule, the whole file will be parsed as [ANY, EOF]. Is that really what you want? Probably not.
So I'm going to guess at what you really want. I'll guess you want any string that doesn't include a double quote character. Maybe there are other restrictions, such as no nonprinting characters. Maybe you want to allow double quotes if the are preceded by a backslash. Who knows? Adjust as needed.
Here is what you can do. First, replace the token definitions with
TOKEN:
{
< NUMBER: (["0"-"9"])+ > |
<PATH: "path"> |
<RPAR: "("> |
<LPAR: ")"> |
<COMMA: ","> |
<STRING: "\"" (~["\""])* "\"" >
}
Then change your grammar to
int expr():
{
String leftValue ;
int rightValue ;
}
{
<PATH> <RPAR> leftValue=str() <COMMA> rightValue = num() <LPAR>
{ return 0; }
}
String str():
{
Token t;
int len ;
}
{
t = <String>
{ len = t.image.length() ; }
{ return t.image.substring(1,len-1); }
}
I am trying javacc for the first time with a simple naive example which is not working. My BNF is as follows:
<exp>:= <num>"+"<num>
<num>:= <digit> | <digit><num>
<digit>:= [0-9]
Based on this BNF, I am writing the SimpleAdd.jj as follows:
options
{
}
PARSER_BEGIN(SimpleAdd)
public class SimpleAdd
{
}
PARSER_END(SimpleAdd)
SKIP :
{
" "
| "\r"
| "\t"
| "\n"
}
TOKEN:
{
< NUMBER: (["0"-"9"])+ >
}
int expr():
{
int leftValue ;
int rightValue ;
}
{
leftValue = num()
"+"
rightValue = num()
{ return leftValue+rightValue; }
}
int num():
{
Token t;
}
{
t = <NUMBER> { return Integer.parseInt(t.toString()); }
}
using the above file, I am generating the java source classes. My main class is as follows:
public class Main {
public static void main(String [] args) throws ParseException {
SimpleAdd parser = new SimpleAdd(System.in);
int x = parser.expr();
System.out.println(x);
}
}
When I am entering the expression via System.in, I am getting the following error:
11+11^D
Exception in thread "main" SimpleAddTest.ParseException: Encountered "<EOF>" at line 0, column 0.
Was expecting:
<NUMBER> ...
at SimpleAddTest.SimpleAdd.generateParseException(SimpleAdd.java:200)
at SimpleAddTest.SimpleAdd.jj_consume_token(SimpleAdd.java:138)
at SimpleAddTest.SimpleAdd.num(SimpleAdd.java:16)
at SimpleAddTest.SimpleAdd.expr(SimpleAdd.java:7)
at SimpleAddTest.Main.main(Main.java:9)
Any hint to solve the problem ?
Edit Note that this answer answers an earlier version of the question.
When a BNF production uses a nonterminal that returns a result, you can record that result in a variable.
First declare the variables in the declaration part of the BNF production
int expr():
{
int leftValue ;
int rightValue ;
}
{
Second, in the main body of the production, record the results in the variables.
leftValue = num()
"+"
rightValue = num()
Finally, use the values of those variables to compute the result of this production.
{ return leftValue+rightValue; }
}
I want to implement the rule coding in my parser generated by javaCC :
Do not change a loop variable inside a for-loop block.
the Rule Production javacc of for-loop block is :
void MyMethod () : {}
{
"(" Argument () ")" {}
(Statement ()) *
}
void Statement () : {}
{
expressionFOR()
}
void expressionFOR() :{}
{
<For> <id> "= " 1 <to> 100
int J
int kk =SUM( , J)
......
}
thank you very much in advance
Assuming you are using JJTree with MULTI=false and VISITOR=true, you could write a visitor along this line
public void visit(SimpleNode node, Object data) {
if( this is a for loop node ) {
push the for loop variable onto a stack of variables
node.childrenAccept(this, null) ;
pop the stack }
else {
if( this is an assignment statement node
and the target variable is on the stack )
report rule violated
node.childrenAccept(this, null) ;
}
}
I have made a AssignStatement class and i am trying to pass the String using javacc.
The assignment statement is of the form :a=b+c*d.
Here, is the Source Code
options
{
static=false;
DEBUG_TOKEN_MANAGER=true;
}
public class AssignStatement
{
public static void main(String s[])
{
try
{
AssignStatement as=new AssignStatement(System.in);
as.StartSymbol();
System.out.println("Syntax checking successfully");
}
catch(Throwable e)
{
System.out.println("Syntex checking failed"+e.getMessage());
}
}
}
PARSER_END(AssignStatement)
SKIP: {"" | "\t" | "\n" | "\r" }
TOKEN:{ "(" | ")" | "+" | "*" | ":=" | <NUM:(["0"-"9"])+>| <ID:(["a"-"z"])+> }
void StartSymbol(): {}
{
(AStmt())*<EOF>
}
void AStmt(): {}
{
LOOKAHEAD(2) <ID> "=" AStmt()
| Term() ("+" Term())*
}
void Term(): {}
{
Factor() ("*" Factor())*
}
void Factor(): {}
{
<NUM>
| <ID>
| "(" AStmt() ")"
}
The Output i got after i did java AssignStatement
"a=10+20*30"
Current character : \" (34) at line 1 column 1
No string literal matches possible.
Starting NFA to match one of : { , }
Current character : \" (34) at line 1 column 1
Syntex checking failedLexical error at line 1, column 1. Encountered: "\"" (34)
, after : ""
Output I should get
syntex checked successfully.
The first character of the input is ", but there is no regular expression that allows the first character to be a ". So the lexer throws a TokenManagerError after reading the first character.
I have a JavaCC grammar with following definitions:
<REGULAR_IDENTIFIER : (["A"-"Z"])+ > // simple identifier like say "DODGE"
<_LABEL : (["A"-"Z"])+ (":") > // label, eg "DODGE:"
<DOUBLECOLON : "::">
<COLON : ":">
Right now "DODGE::" lexed as <_LABEL> <COLON> ("DODGE:" ":")
but i need to lex it as <REGULAR_IDENTIFIER> <DOUBLECOLON> ("DODGE" "::")
I think the following will work
MORE: { < (["A"-"Z"])+ :S0 > } // Could be identifier or label.
<S0> TOKEN: { <LABEL : ":" : DEFAULT> } // label, eg "DODGE:"
<S0> TOKEN: { <IDENTIFIER : "" : DEFAULT > } // simple identifier like say "DODGE"
<S0> TOKEN: { <IDENTIFIER : "::" { matchedToken.image = image.substring(0,image.size()-2) ; } : S1 > }
<S1> TOKEN: { <DOUBLECOLON : "" { matchedToken.image = "::" ; } : DEFAULT> }
<DOUBLECOLON : "::">
<COLON : ":">
Note that "DODGE:::" is three tokens, not two.
In javacc the maximal match rule (longest prefix match rule) is used see:
http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-moz.htm#more-than-one
This means that the _LABEL token will be matched before the REGULAR_IDENTIFIER token, as the _LABEL token will contain more characters. This means that what you are trying to do should not be done in the tokenizer.
I have written a parser which recognizes the grammar correctly, I use the parser for recognizing the _LABEL's, instead of the tokenizer:
options {
STATIC = false;
}
PARSER_BEGIN(Parser)
import java.io.StringReader;
public class Parser {
//Main method, parses the first argument to the program
public static void main(String[] args) throws ParseException {
System.out.println("Parseing: " + args[0]);
Parser parser = new Parser(new StringReader(args[0]));
parser.Start();
}
}
PARSER_END(Parser)
//The _LABEL will be recognized by the parser, not the tokenizer
TOKEN :
{
<DOUBLECOLON : "::"> //The double token will be preferred to the single colon due to the maximal munch rule
|
<COLON : ":">
|
<REGULAR_IDENTIFIER : (["A"-"Z"])+ > // simple identifier like say "DODGE"
}
/** Root production. */
void Start() :
{}
{
(
LOOKAHEAD(2) //We need a lookahead of two, to see if this is a label or not
<REGULAR_IDENTIFIER> <COLON> { System.err.println("label"); } //Labels, should probably be put in it's own production
| <REGULAR_IDENTIFIER> { System.err.println("reg_id"); } //Regulair identifiers
| <DOUBLECOLON> { System.err.println("DC"); }
| <COLON> { System.err.println("C"); }
)+
}
In a real you should of cause move the <REGULAR_IDENTIFIER> <COLON> to a _label production.
Hope it helps.