Javacc error reporting results in “Expansion can be matched by empty string.” - javacc

I am trying to add some custom error messages to my javacc parser to hopefully make the error messages more specific and the language problems easier to find and correct.
The first error that I am trying to focus in on is how to detect that the correct number of arguments have been provided to a 'function' call. Rather than the default message, I would like to print out something like "missing argument to function".
My simplified language and my attempt to catch a missing argument error looks something like:
double arg(boolean allowMissing):
{ double v; Token t; }
{
t = <INT> { return Double.parseDouble(t.image); }
| t = <DOUBLE> { return Double.parseDouble(t.image); }
| v = functions() { return v; }
| { if (!allowMissing) throw new ParseException("Missing argument");} // #1 Throw error if missing argument
}
double functions() :
{ double v1, v2, result;
double[] array;
}
{
(<MIN> "(" v1=arg(false) "," v2=arg(false) ")") { return (v1<v2)?v1:v2; }
| (<MAX> "(" v1=arg(false) "," v2=arg(false) ")") { return (v1>v2)?v1:v2; }
| (<POW> "(" v1=arg(false) "," v2=arg(false) ")") { return Math.pow(v1, v2); }
| (<SUM> "(" array=argList() ")") { result=0; for (double v:array) result+=v; return result;}
}
double[] argList() :
{
ArrayList<Double> list = new ArrayList<>();
double v;
}
{
( (v=arg(true) { list.add(v);} ( "," v=arg(false) {list.add(v);} )*)?) { // #2 Expansion can be matched by empty string here
double[] arr = new double[list.size()];
for (int i=0; i<list.size(); i++)
arr[i] = list.get(i);
return arr;
}
}
As you can see functions will recursively resolve their arguments, and this allows function call to be nested.
Here are a few valid expressions that can be parsed in this language:
"min(1,2)",
"max(1,2)",
"max(pow(2,2),2)",
"sum(1,2,3,4,5)",
"sum()"
Here is an invalid expression:
"min()"
This all works well until I tried to check for missing arguments (code location #1). This works fine for the functions that have a fixed number of arguments. The problem is that the sum function (code location #2) is allowed to have zero arguments. I even passed in a flag to not throw an error if missing arguments are allowed. however, javacc gives me an error at location #2 that "Expansion within "(...)?" can be matched by empty string". I understand why I get this error. I have also read the answer for JavaCC custom errors cause "Expansion can be matched by empty string." but it did not help me.
My problem is that I just cannot see how I can have this both ways. I want to throw an error for missing arguments in the functions that have a fixed number of arguments, but I don't want an error in the function that allows no arguments. Is there a way to refactor my parser so that I still use the recursive style, catch missing arguments from the functions that take a fixed arguments, yet allow some functions to have zero arguments?
Or is there a better way to add in custom error messages? I am not really seeing much in the documentation.
Also, any pointers to examples that use more sophisticated error reporting would be greatly appreciated. I am actually using jjtree, but I simplified it down for this example.

Here's what I would do.
Instead of using a boolean argument in function arg, I would use the ? operator:
double arg():
{ double v; Token t; }
{
t = <INT> { return Double.parseDouble(t.image); }
| t = <DOUBLE> { return Double.parseDouble(t.image); }
| v = functions() { return v; }
}
double functions() :
{ double v1=0, v2=0, result;
double[] array;
}
{
(<MIN> "(" (v1=arg())? "," (v2=arg())? ")") { return (v1<v2)?v1:v2; }
| (<MAX> "(" (v1=arg())? "," (v2=arg())? ")") { return (v1>v2)?v1:v2; }
| (<POW> "(" (v1=arg())? "," (v2=arg())? ")") { return Math.pow(v1, v2); }
| (<SUM> "(" array=argList() ")") { result=0; for (double v:array) result+=v; return result;}
}
double[] argList() :
{
List<Double> list = new ArrayList<Double>();
double v;
}
{
( (v=arg() { list.add(v); } | { list.add(0.); } )
( "," (v=arg() { list.add(v); } | { list.add(0.); } ) )*) {
double[] arr = new double[list.size()];
for (int i=0; i<list.size(); i++)
arr[i] = list.get(i);
return arr;
}
}

You could do this
double arg():
{ double v; Token t; }
{
t = <INT> { return Double.parseDouble(t.image); }
| t = <DOUBLE> { return Double.parseDouble(t.image); }
| v = functions() { return v; }
}
double argRequired():
{ double v; }
{
v = arg() { return v ; }
| { if (!allowMissing) throw new ParseException("Missing argument");} // #1 Throw error if missing argument
}
double argOptional( double defaultValue ): // Not needed for this example, but might be useful.
{ double v; }
{
v = arg() { return v ; }
| { return defaultValue ; }
}
double functions() :
{ double v1, v2, result;
double[] array;
}
{
(<MIN> "(" v1=argRequired() "," v2=argRequired() ")") { return (v1<v2)?v1:v2; }
| (<MAX> "(" v1=argRequired() "," v2=argRequired() ")") { return (v1>v2)?v1:v2; }
| (<POW> "(" v1=argRequired() "," v2=argRequired() ")") { return Math.pow(v1, v2); }
| (<SUM> "(" array=argList() ")") { result=0; for (double v:array) result+=v; return result;}
}
double[] argList( ) :
{
ArrayList<Double> list = new ArrayList<>();
double v;
}
{
( v=arg() { list.add(v);}
( "," v=argRequired() {list.add(v);}
)*
)?
{
double[] arr = new double[list.size()];
for (int i=0; i<list.size(); i++)
arr[i] = list.get(i);
return arr;
}
}

Related

Javacc grammar not working with optional tokens

I have a DFM (is a Delphi source file, like JSON, to define form component layouts) parser created with javaCC.
My grammar (.jj file) define this:
private DfmObject dfmObject():
{
DfmObject res = new DfmObject();
DfmProperty prop;
DfmObject obj;
Token tName;
Token tType;
}
{
<OBJECT>
(tName = <IDENTIFIER> { res.setName(tName.image); } <COLON>)?
tType = <IDENTIFIER> { res.setType(tType.image); }
<ENDLINE>
( prop = property() { res.addProperty(prop); } )*
( obj = dfmObject() { res.addChild(obj); } (<ENDLINE>)*)*
<END>
{ return res; }
}
This is for parsing 2 types of object definitions:
object name: Type
end
as so
object Type
end
So, the name : is optional.
But, when I try to parse this second DFM, I always get this error:
Exception in thread "main" eu.kaszkowiak.jdfm.parser.ParseException: Encountered " <ENDLINE> "\r\n"" at line 1, column 12.
Was expecting:
":" ...
What I'm doing wrong?
A solution/workaround is, to make optional the : Type part and switch between the name and type values when the type == null.
See the grammar implementation:
private DfmObject dfmObject():
{
DfmObject res = new DfmObject();
DfmProperty prop;
DfmObject obj;
Token tName;
Token tType;
}
{
(
<OBJECT>
(
tName = <IDENTIFIER> { res.setName(tName.image); }
)
( <COLON> tType = <IDENTIFIER> { res.setType(tType.image); } )?
<ENDLINE>
)
( prop = property() { res.addProperty(prop); } )*
( obj = dfmObject() { res.addChild(obj); } (<ENDLINE>)*)*
<END>
{
if (res.getType() == null) {
res.setType(res.getName());
res.setName(null);
}
return res;
}
}

Define token to match any string

I am new to javacc. I am trying to define a token which can match any string. I am following the regex syntax <ANY: (~[])+> which is not working. I want to achieve something very simple, define an expression having the following BNF:
<exp> ::= "path(" <string> "," <number> ")"
My current .jj file is as follows, any help on how I can parse the string:
options
{
}
PARSER_BEGIN(SimpleAdd)
package SimpleAddTest;
public class SimpleAdd
{
}
PARSER_END(SimpleAdd)
SKIP :
{
" "
| "\r"
| "\t"
| "\n"
}
TOKEN:
{
< NUMBER: (["0"-"9"])+ > |
<PATH: "path"> |
<RPAR: "("> |
<LPAR: ")"> |
<QUOTE: "'"> |
<COMMA: ","> |
<ANY: (~[])+>
}
int expr():
{
String leftValue ;
int rightValue ;
}
{
<PATH> <RPAR> <QUOTE> leftValue = str() <QUOTE> <COMMA> rightValue = num() <LPAR>
{ return 0; }
}
String str():
{
Token t;
}
{
t = <ANY> { return t.toString(); }
}
int num():
{
Token t;
}
{
t = <NUMBER> { return Integer.parseInt(t.toString()); }
}
The error I am getting with the above javacc file is:
Exception in thread "main" SimpleAddTest.ParseException: Encountered " <ANY> "path(\'5\',1) "" at line 1, column 1.
Was expecting:
"path" ...
The pattern <ANY: (~[])+> will indeed match any nonempty string. The issue is that this is not what you really want. If you have a rule <ANY: (~[])+>, it will match the whole file, unless the file is empty. In most cases, because of the longest match rule, the whole file will be parsed as [ANY, EOF]. Is that really what you want? Probably not.
So I'm going to guess at what you really want. I'll guess you want any string that doesn't include a double quote character. Maybe there are other restrictions, such as no nonprinting characters. Maybe you want to allow double quotes if the are preceded by a backslash. Who knows? Adjust as needed.
Here is what you can do. First, replace the token definitions with
TOKEN:
{
< NUMBER: (["0"-"9"])+ > |
<PATH: "path"> |
<RPAR: "("> |
<LPAR: ")"> |
<COMMA: ","> |
<STRING: "\"" (~["\""])* "\"" >
}
Then change your grammar to
int expr():
{
String leftValue ;
int rightValue ;
}
{
<PATH> <RPAR> leftValue=str() <COMMA> rightValue = num() <LPAR>
{ return 0; }
}
String str():
{
Token t;
int len ;
}
{
t = <String>
{ len = t.image.length() ; }
{ return t.image.substring(1,len-1); }
}

JavaCC simple example not working

I am trying javacc for the first time with a simple naive example which is not working. My BNF is as follows:
<exp>:= <num>"+"<num>
<num>:= <digit> | <digit><num>
<digit>:= [0-9]
Based on this BNF, I am writing the SimpleAdd.jj as follows:
options
{
}
PARSER_BEGIN(SimpleAdd)
public class SimpleAdd
{
}
PARSER_END(SimpleAdd)
SKIP :
{
" "
| "\r"
| "\t"
| "\n"
}
TOKEN:
{
< NUMBER: (["0"-"9"])+ >
}
int expr():
{
int leftValue ;
int rightValue ;
}
{
leftValue = num()
"+"
rightValue = num()
{ return leftValue+rightValue; }
}
int num():
{
Token t;
}
{
t = <NUMBER> { return Integer.parseInt(t.toString()); }
}
using the above file, I am generating the java source classes. My main class is as follows:
public class Main {
public static void main(String [] args) throws ParseException {
SimpleAdd parser = new SimpleAdd(System.in);
int x = parser.expr();
System.out.println(x);
}
}
When I am entering the expression via System.in, I am getting the following error:
11+11^D
Exception in thread "main" SimpleAddTest.ParseException: Encountered "<EOF>" at line 0, column 0.
Was expecting:
<NUMBER> ...
at SimpleAddTest.SimpleAdd.generateParseException(SimpleAdd.java:200)
at SimpleAddTest.SimpleAdd.jj_consume_token(SimpleAdd.java:138)
at SimpleAddTest.SimpleAdd.num(SimpleAdd.java:16)
at SimpleAddTest.SimpleAdd.expr(SimpleAdd.java:7)
at SimpleAddTest.Main.main(Main.java:9)
Any hint to solve the problem ?
Edit Note that this answer answers an earlier version of the question.
When a BNF production uses a nonterminal that returns a result, you can record that result in a variable.
First declare the variables in the declaration part of the BNF production
int expr():
{
int leftValue ;
int rightValue ;
}
{
Second, in the main body of the production, record the results in the variables.
leftValue = num()
"+"
rightValue = num()
Finally, use the values of those variables to compute the result of this production.
{ return leftValue+rightValue; }
}

directions for use javacc token

I want to distinguish multiple tokens.
Look at my code.
TOKEN :
{
< LOOPS :
< BEAT >
| < BASS >
| < MELODY >
>
| < #BEAT : "beat" >
| < #BASS : "bass" >
| < #MELODY : "melody" >
}
void findType():
{Token loops;}
{
loops = < LOOPS >
{ String type = loops.image; }
I want to use the findType () function to find the type.
How can I get return the correct output when the input is "beat"?
What you want to do is to add a return statement, like this:
String findType():
{Token loops;}
{
loops = < LOOPS >
{
String type = loops.image;
return type;
}
}
Have in mind you have changed return value definition in the method, from void to String.
Then, from your main:
ExampleGrammar parser = new ExampleGrammar(System.in);
while (true)
{
System.out.println("Reading from standard input...");
System.out.print("Enter loop:");
try
{
String type = ExampleGrammar.findType();
System.out.println("Type is: " + type);
}
catch (Exception e)
{
System.out.println("NOK.");
System.out.println(e.getMessage());
ExampleGrammar.ReInit(System.in);
}
catch (Error e)
{
System.out.println("Oops.");
System.out.println(e.getMessage());
break;
}
}
It generates an output like:
Reading from standard input...
Enter loop:bass
Type is: bass
Reading from standard input...
Enter loop:beat
Type is: beat

Is it possible to use a Kleene Operator for Flex Formatters?

is it possible to use a Kleene Operator (Kleene Star) for the Formatters?
I want to use a phoneFormatter, which puts a minus after the 5th number and afterwards it should be possible to have a variable number of numbers.
E.g.: 0172-555666999, 0160-44552 etc.
That is how I started, but I don't know which character belongs after the last hash (it is not a star, I already tried it ;-) ):
<fx:Declarations>
<mx:PhoneFormatter id="mPhoneFormat"
formatString="####-#"/>
</fx:Declarations>
The default PhoneFormatter expects the input string to have the same number of characters as the format string. They don't support regular expression patterns (like * to match the element zero or more times).
However, it's pretty easy to make your own formatter. To do this, I extended the PhoneFormatter class and overrode its format() method. I copied and pasted the original format() method and made the following modifications:
comment out the code that compared the length of the source string with the length of the format string
compare the length of the formatted string. If the original string is longer, append the remaining chars from the original string to the formatted string.
This probably won't handle all of your use cases, but it should be pretty straightforward to modify this to your needs.
package
{
import mx.formatters.PhoneFormatter;
import mx.formatters.SwitchSymbolFormatter;
public class CustomPhoneNumberFormatter extends PhoneFormatter
{
public function CustomPhoneNumberFormatter()
{
super();
}
override public function format(value:Object):String
{
// Reset any previous errors.
if (error)
error = null;
// --value--
if (!value || String(value).length == 0 || isNaN(Number(value)))
{
error = defaultInvalidValueError;
return "";
}
// --length--
var fStrLen:int = 0;
var letter:String;
var n:int;
var i:int;
n = formatString.length;
for (i = 0; i < n; i++)
{
letter = formatString.charAt(i);
if (letter == "#")
{
fStrLen++;
}
else if (validPatternChars.indexOf(letter) == -1)
{
error = defaultInvalidFormatError;
return "";
}
}
// if (String(value).length != fStrLen)
// {
// error = defaultInvalidValueError;
// return "";
// }
// --format--
var fStr:String = formatString;
if (fStrLen == 7 && areaCode != -1)
{
var aCodeLen:int = 0;
n = areaCodeFormat.length;
for (i = 0; i < n; i++)
{
if (areaCodeFormat.charAt(i) == "#")
aCodeLen++;
}
if (aCodeLen == 3 && String(areaCode).length == 3)
{
fStr = String(areaCodeFormat).concat(fStr);
value = String(areaCode).concat(value);
}
}
var dataFormatter:SwitchSymbolFormatter = new SwitchSymbolFormatter();
var source:String = String(value);
var returnValue:String = dataFormatter.formatValue(fStr, value);
if (source.length > returnValue.length)
{
returnValue = returnValue + source.substr(returnValue.length-1);
}
return returnValue;
}
}
}

Resources