JavaCC simple example not working - javacc

I am trying javacc for the first time with a simple naive example which is not working. My BNF is as follows:
<exp>:= <num>"+"<num>
<num>:= <digit> | <digit><num>
<digit>:= [0-9]
Based on this BNF, I am writing the SimpleAdd.jj as follows:
options
{
}
PARSER_BEGIN(SimpleAdd)
public class SimpleAdd
{
}
PARSER_END(SimpleAdd)
SKIP :
{
" "
| "\r"
| "\t"
| "\n"
}
TOKEN:
{
< NUMBER: (["0"-"9"])+ >
}
int expr():
{
int leftValue ;
int rightValue ;
}
{
leftValue = num()
"+"
rightValue = num()
{ return leftValue+rightValue; }
}
int num():
{
Token t;
}
{
t = <NUMBER> { return Integer.parseInt(t.toString()); }
}
using the above file, I am generating the java source classes. My main class is as follows:
public class Main {
public static void main(String [] args) throws ParseException {
SimpleAdd parser = new SimpleAdd(System.in);
int x = parser.expr();
System.out.println(x);
}
}
When I am entering the expression via System.in, I am getting the following error:
11+11^D
Exception in thread "main" SimpleAddTest.ParseException: Encountered "<EOF>" at line 0, column 0.
Was expecting:
<NUMBER> ...
at SimpleAddTest.SimpleAdd.generateParseException(SimpleAdd.java:200)
at SimpleAddTest.SimpleAdd.jj_consume_token(SimpleAdd.java:138)
at SimpleAddTest.SimpleAdd.num(SimpleAdd.java:16)
at SimpleAddTest.SimpleAdd.expr(SimpleAdd.java:7)
at SimpleAddTest.Main.main(Main.java:9)
Any hint to solve the problem ?

Edit Note that this answer answers an earlier version of the question.
When a BNF production uses a nonterminal that returns a result, you can record that result in a variable.
First declare the variables in the declaration part of the BNF production
int expr():
{
int leftValue ;
int rightValue ;
}
{
Second, in the main body of the production, record the results in the variables.
leftValue = num()
"+"
rightValue = num()
Finally, use the values of those variables to compute the result of this production.
{ return leftValue+rightValue; }
}

Related

Error Recovery in JavaCC for Qbasic language

I am developing a compiler (with JavaCC) for QBasic language and I have an issue relate to Error Recovery (Error Recovery is showing all compiler errors when you compile the program)
so I had to handle ParseException and ignore the line where ParseException occurs
note : QBasic language has no semicolons so every statement has a separated line
I have tried to try catch the ParseException in every statement and handle it by using getNextToken repeatedly until I have "\n" token
unfortunately that does not work !!
Here is my program method :
void program():
{
Node n =null;
programNode ret = new programNode() ;
boolean canrun=true;
}
{
(< LINE > | < SPACE >)*
(
try {
n = statement()(< SPACE >)* <LINE>
}
catch(ParseException e)
{
canrun=false;
Excep.add(e);
Token t;
do
{
t=CodeParserTokenManager.getNextToken();
}while (t.image!="\n");
}
(< LINE > | < SPACE >)*{
if (n!=null)
ret.addChild(n);
})+ "?"
{
if (canrun)
ret.Start();
}
}
And here is my Parser class :
PARSER_BEGIN(CodeParser)
import java.util.ArrayList;
public class CodeParser
{
public static void main(String args[])
{
CodeParser Parser = new CodeParser(System.in);
try {
program() ;
}
catch(ParseException e)
{
}
}
}
PARSER_END(CodeParser)
I believe the problem is the line:
}while (t.image!="\n");
because 1) you shouldn't use != with strings, 2) the image could be different ("\r\n" for instance).
Try t.kind!=LINE.

Javacc error reporting results in “Expansion can be matched by empty string.”

I am trying to add some custom error messages to my javacc parser to hopefully make the error messages more specific and the language problems easier to find and correct.
The first error that I am trying to focus in on is how to detect that the correct number of arguments have been provided to a 'function' call. Rather than the default message, I would like to print out something like "missing argument to function".
My simplified language and my attempt to catch a missing argument error looks something like:
double arg(boolean allowMissing):
{ double v; Token t; }
{
t = <INT> { return Double.parseDouble(t.image); }
| t = <DOUBLE> { return Double.parseDouble(t.image); }
| v = functions() { return v; }
| { if (!allowMissing) throw new ParseException("Missing argument");} // #1 Throw error if missing argument
}
double functions() :
{ double v1, v2, result;
double[] array;
}
{
(<MIN> "(" v1=arg(false) "," v2=arg(false) ")") { return (v1<v2)?v1:v2; }
| (<MAX> "(" v1=arg(false) "," v2=arg(false) ")") { return (v1>v2)?v1:v2; }
| (<POW> "(" v1=arg(false) "," v2=arg(false) ")") { return Math.pow(v1, v2); }
| (<SUM> "(" array=argList() ")") { result=0; for (double v:array) result+=v; return result;}
}
double[] argList() :
{
ArrayList<Double> list = new ArrayList<>();
double v;
}
{
( (v=arg(true) { list.add(v);} ( "," v=arg(false) {list.add(v);} )*)?) { // #2 Expansion can be matched by empty string here
double[] arr = new double[list.size()];
for (int i=0; i<list.size(); i++)
arr[i] = list.get(i);
return arr;
}
}
As you can see functions will recursively resolve their arguments, and this allows function call to be nested.
Here are a few valid expressions that can be parsed in this language:
"min(1,2)",
"max(1,2)",
"max(pow(2,2),2)",
"sum(1,2,3,4,5)",
"sum()"
Here is an invalid expression:
"min()"
This all works well until I tried to check for missing arguments (code location #1). This works fine for the functions that have a fixed number of arguments. The problem is that the sum function (code location #2) is allowed to have zero arguments. I even passed in a flag to not throw an error if missing arguments are allowed. however, javacc gives me an error at location #2 that "Expansion within "(...)?" can be matched by empty string". I understand why I get this error. I have also read the answer for JavaCC custom errors cause "Expansion can be matched by empty string." but it did not help me.
My problem is that I just cannot see how I can have this both ways. I want to throw an error for missing arguments in the functions that have a fixed number of arguments, but I don't want an error in the function that allows no arguments. Is there a way to refactor my parser so that I still use the recursive style, catch missing arguments from the functions that take a fixed arguments, yet allow some functions to have zero arguments?
Or is there a better way to add in custom error messages? I am not really seeing much in the documentation.
Also, any pointers to examples that use more sophisticated error reporting would be greatly appreciated. I am actually using jjtree, but I simplified it down for this example.
Here's what I would do.
Instead of using a boolean argument in function arg, I would use the ? operator:
double arg():
{ double v; Token t; }
{
t = <INT> { return Double.parseDouble(t.image); }
| t = <DOUBLE> { return Double.parseDouble(t.image); }
| v = functions() { return v; }
}
double functions() :
{ double v1=0, v2=0, result;
double[] array;
}
{
(<MIN> "(" (v1=arg())? "," (v2=arg())? ")") { return (v1<v2)?v1:v2; }
| (<MAX> "(" (v1=arg())? "," (v2=arg())? ")") { return (v1>v2)?v1:v2; }
| (<POW> "(" (v1=arg())? "," (v2=arg())? ")") { return Math.pow(v1, v2); }
| (<SUM> "(" array=argList() ")") { result=0; for (double v:array) result+=v; return result;}
}
double[] argList() :
{
List<Double> list = new ArrayList<Double>();
double v;
}
{
( (v=arg() { list.add(v); } | { list.add(0.); } )
( "," (v=arg() { list.add(v); } | { list.add(0.); } ) )*) {
double[] arr = new double[list.size()];
for (int i=0; i<list.size(); i++)
arr[i] = list.get(i);
return arr;
}
}
You could do this
double arg():
{ double v; Token t; }
{
t = <INT> { return Double.parseDouble(t.image); }
| t = <DOUBLE> { return Double.parseDouble(t.image); }
| v = functions() { return v; }
}
double argRequired():
{ double v; }
{
v = arg() { return v ; }
| { if (!allowMissing) throw new ParseException("Missing argument");} // #1 Throw error if missing argument
}
double argOptional( double defaultValue ): // Not needed for this example, but might be useful.
{ double v; }
{
v = arg() { return v ; }
| { return defaultValue ; }
}
double functions() :
{ double v1, v2, result;
double[] array;
}
{
(<MIN> "(" v1=argRequired() "," v2=argRequired() ")") { return (v1<v2)?v1:v2; }
| (<MAX> "(" v1=argRequired() "," v2=argRequired() ")") { return (v1>v2)?v1:v2; }
| (<POW> "(" v1=argRequired() "," v2=argRequired() ")") { return Math.pow(v1, v2); }
| (<SUM> "(" array=argList() ")") { result=0; for (double v:array) result+=v; return result;}
}
double[] argList( ) :
{
ArrayList<Double> list = new ArrayList<>();
double v;
}
{
( v=arg() { list.add(v);}
( "," v=argRequired() {list.add(v);}
)*
)?
{
double[] arr = new double[list.size()];
for (int i=0; i<list.size(); i++)
arr[i] = list.get(i);
return arr;
}
}

Javacc grammar not working with optional tokens

I have a DFM (is a Delphi source file, like JSON, to define form component layouts) parser created with javaCC.
My grammar (.jj file) define this:
private DfmObject dfmObject():
{
DfmObject res = new DfmObject();
DfmProperty prop;
DfmObject obj;
Token tName;
Token tType;
}
{
<OBJECT>
(tName = <IDENTIFIER> { res.setName(tName.image); } <COLON>)?
tType = <IDENTIFIER> { res.setType(tType.image); }
<ENDLINE>
( prop = property() { res.addProperty(prop); } )*
( obj = dfmObject() { res.addChild(obj); } (<ENDLINE>)*)*
<END>
{ return res; }
}
This is for parsing 2 types of object definitions:
object name: Type
end
as so
object Type
end
So, the name : is optional.
But, when I try to parse this second DFM, I always get this error:
Exception in thread "main" eu.kaszkowiak.jdfm.parser.ParseException: Encountered " <ENDLINE> "\r\n"" at line 1, column 12.
Was expecting:
":" ...
What I'm doing wrong?
A solution/workaround is, to make optional the : Type part and switch between the name and type values when the type == null.
See the grammar implementation:
private DfmObject dfmObject():
{
DfmObject res = new DfmObject();
DfmProperty prop;
DfmObject obj;
Token tName;
Token tType;
}
{
(
<OBJECT>
(
tName = <IDENTIFIER> { res.setName(tName.image); }
)
( <COLON> tType = <IDENTIFIER> { res.setType(tType.image); } )?
<ENDLINE>
)
( prop = property() { res.addProperty(prop); } )*
( obj = dfmObject() { res.addChild(obj); } (<ENDLINE>)*)*
<END>
{
if (res.getType() == null) {
res.setType(res.getName());
res.setName(null);
}
return res;
}
}

Define token to match any string

I am new to javacc. I am trying to define a token which can match any string. I am following the regex syntax <ANY: (~[])+> which is not working. I want to achieve something very simple, define an expression having the following BNF:
<exp> ::= "path(" <string> "," <number> ")"
My current .jj file is as follows, any help on how I can parse the string:
options
{
}
PARSER_BEGIN(SimpleAdd)
package SimpleAddTest;
public class SimpleAdd
{
}
PARSER_END(SimpleAdd)
SKIP :
{
" "
| "\r"
| "\t"
| "\n"
}
TOKEN:
{
< NUMBER: (["0"-"9"])+ > |
<PATH: "path"> |
<RPAR: "("> |
<LPAR: ")"> |
<QUOTE: "'"> |
<COMMA: ","> |
<ANY: (~[])+>
}
int expr():
{
String leftValue ;
int rightValue ;
}
{
<PATH> <RPAR> <QUOTE> leftValue = str() <QUOTE> <COMMA> rightValue = num() <LPAR>
{ return 0; }
}
String str():
{
Token t;
}
{
t = <ANY> { return t.toString(); }
}
int num():
{
Token t;
}
{
t = <NUMBER> { return Integer.parseInt(t.toString()); }
}
The error I am getting with the above javacc file is:
Exception in thread "main" SimpleAddTest.ParseException: Encountered " <ANY> "path(\'5\',1) "" at line 1, column 1.
Was expecting:
"path" ...
The pattern <ANY: (~[])+> will indeed match any nonempty string. The issue is that this is not what you really want. If you have a rule <ANY: (~[])+>, it will match the whole file, unless the file is empty. In most cases, because of the longest match rule, the whole file will be parsed as [ANY, EOF]. Is that really what you want? Probably not.
So I'm going to guess at what you really want. I'll guess you want any string that doesn't include a double quote character. Maybe there are other restrictions, such as no nonprinting characters. Maybe you want to allow double quotes if the are preceded by a backslash. Who knows? Adjust as needed.
Here is what you can do. First, replace the token definitions with
TOKEN:
{
< NUMBER: (["0"-"9"])+ > |
<PATH: "path"> |
<RPAR: "("> |
<LPAR: ")"> |
<COMMA: ","> |
<STRING: "\"" (~["\""])* "\"" >
}
Then change your grammar to
int expr():
{
String leftValue ;
int rightValue ;
}
{
<PATH> <RPAR> leftValue=str() <COMMA> rightValue = num() <LPAR>
{ return 0; }
}
String str():
{
Token t;
int len ;
}
{
t = <String>
{ len = t.image.length() ; }
{ return t.image.substring(1,len-1); }
}

JavaCC grammar - proper lexing

I have a JavaCC grammar with following definitions:
<REGULAR_IDENTIFIER : (["A"-"Z"])+ > // simple identifier like say "DODGE"
<_LABEL : (["A"-"Z"])+ (":") > // label, eg "DODGE:"
<DOUBLECOLON : "::">
<COLON : ":">
Right now "DODGE::" lexed as <_LABEL> <COLON> ("DODGE:" ":")
but i need to lex it as <REGULAR_IDENTIFIER> <DOUBLECOLON> ("DODGE" "::")
I think the following will work
MORE: { < (["A"-"Z"])+ :S0 > } // Could be identifier or label.
<S0> TOKEN: { <LABEL : ":" : DEFAULT> } // label, eg "DODGE:"
<S0> TOKEN: { <IDENTIFIER : "" : DEFAULT > } // simple identifier like say "DODGE"
<S0> TOKEN: { <IDENTIFIER : "::" { matchedToken.image = image.substring(0,image.size()-2) ; } : S1 > }
<S1> TOKEN: { <DOUBLECOLON : "" { matchedToken.image = "::" ; } : DEFAULT> }
<DOUBLECOLON : "::">
<COLON : ":">
Note that "DODGE:::" is three tokens, not two.
In javacc the maximal match rule (longest prefix match rule) is used see:
http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-moz.htm#more-than-one
This means that the _LABEL token will be matched before the REGULAR_IDENTIFIER token, as the _LABEL token will contain more characters. This means that what you are trying to do should not be done in the tokenizer.
I have written a parser which recognizes the grammar correctly, I use the parser for recognizing the _LABEL's, instead of the tokenizer:
options {
STATIC = false;
}
PARSER_BEGIN(Parser)
import java.io.StringReader;
public class Parser {
//Main method, parses the first argument to the program
public static void main(String[] args) throws ParseException {
System.out.println("Parseing: " + args[0]);
Parser parser = new Parser(new StringReader(args[0]));
parser.Start();
}
}
PARSER_END(Parser)
//The _LABEL will be recognized by the parser, not the tokenizer
TOKEN :
{
<DOUBLECOLON : "::"> //The double token will be preferred to the single colon due to the maximal munch rule
|
<COLON : ":">
|
<REGULAR_IDENTIFIER : (["A"-"Z"])+ > // simple identifier like say "DODGE"
}
/** Root production. */
void Start() :
{}
{
(
LOOKAHEAD(2) //We need a lookahead of two, to see if this is a label or not
<REGULAR_IDENTIFIER> <COLON> { System.err.println("label"); } //Labels, should probably be put in it's own production
| <REGULAR_IDENTIFIER> { System.err.println("reg_id"); } //Regulair identifiers
| <DOUBLECOLON> { System.err.println("DC"); }
| <COLON> { System.err.println("C"); }
)+
}
In a real you should of cause move the <REGULAR_IDENTIFIER> <COLON> to a _label production.
Hope it helps.

Resources