JavaCC grammar - proper lexing - javacc

I have a JavaCC grammar with following definitions:
<REGULAR_IDENTIFIER : (["A"-"Z"])+ > // simple identifier like say "DODGE"
<_LABEL : (["A"-"Z"])+ (":") > // label, eg "DODGE:"
<DOUBLECOLON : "::">
<COLON : ":">
Right now "DODGE::" lexed as <_LABEL> <COLON> ("DODGE:" ":")
but i need to lex it as <REGULAR_IDENTIFIER> <DOUBLECOLON> ("DODGE" "::")

I think the following will work
MORE: { < (["A"-"Z"])+ :S0 > } // Could be identifier or label.
<S0> TOKEN: { <LABEL : ":" : DEFAULT> } // label, eg "DODGE:"
<S0> TOKEN: { <IDENTIFIER : "" : DEFAULT > } // simple identifier like say "DODGE"
<S0> TOKEN: { <IDENTIFIER : "::" { matchedToken.image = image.substring(0,image.size()-2) ; } : S1 > }
<S1> TOKEN: { <DOUBLECOLON : "" { matchedToken.image = "::" ; } : DEFAULT> }
<DOUBLECOLON : "::">
<COLON : ":">
Note that "DODGE:::" is three tokens, not two.

In javacc the maximal match rule (longest prefix match rule) is used see:
http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-moz.htm#more-than-one
This means that the _LABEL token will be matched before the REGULAR_IDENTIFIER token, as the _LABEL token will contain more characters. This means that what you are trying to do should not be done in the tokenizer.
I have written a parser which recognizes the grammar correctly, I use the parser for recognizing the _LABEL's, instead of the tokenizer:
options {
STATIC = false;
}
PARSER_BEGIN(Parser)
import java.io.StringReader;
public class Parser {
//Main method, parses the first argument to the program
public static void main(String[] args) throws ParseException {
System.out.println("Parseing: " + args[0]);
Parser parser = new Parser(new StringReader(args[0]));
parser.Start();
}
}
PARSER_END(Parser)
//The _LABEL will be recognized by the parser, not the tokenizer
TOKEN :
{
<DOUBLECOLON : "::"> //The double token will be preferred to the single colon due to the maximal munch rule
|
<COLON : ":">
|
<REGULAR_IDENTIFIER : (["A"-"Z"])+ > // simple identifier like say "DODGE"
}
/** Root production. */
void Start() :
{}
{
(
LOOKAHEAD(2) //We need a lookahead of two, to see if this is a label or not
<REGULAR_IDENTIFIER> <COLON> { System.err.println("label"); } //Labels, should probably be put in it's own production
| <REGULAR_IDENTIFIER> { System.err.println("reg_id"); } //Regulair identifiers
| <DOUBLECOLON> { System.err.println("DC"); }
| <COLON> { System.err.println("C"); }
)+
}
In a real you should of cause move the <REGULAR_IDENTIFIER> <COLON> to a _label production.
Hope it helps.

Related

Generate Parser for a File with JavaCC

I am a beginner with JavaCC,and i'm trying to generate a file Parser.
I have already been able to generate a successful parser interpenetrated a line that is entered on the keyboard.
Parser example when I enter the keyboard "First Name: William", I managed to display on the screen the name of the variable and the value.
Now I have a file .txt who contain a large number of names and their value, and I would like to successfully display them on the screen.
below is my .jj file that I have already written to generate a parser of a typed line
Now i want the same but for a file.
options
{
static = true;
}
PARSER_BEGIN(parser_name)
public class parser_name
{
public static void main(String args []) throws ParseException
{
System.out.println("Waiting for the Input:");
parser_name parser = new parser_name(System.in);
parser.Start();
}
}
PARSER_END(parser_name)
SKIP :
{
" "
| "\r"
| "\t"
| "\n"
}
TOKEN : { < DIGIT : (["0"-"9"])+ > }
TOKEN : { <VARIABLE: (["a"-"z", "A"-"Z"])+> }
TOKEN : { <VALUE: (~["\n",":"])+> }
TOKEN : { <ASSIGNMENT: ":"> }
void Start(): { Token t,t1,t2;}
{
t=<VARIABLE>
t1=<ASSIGNMENT>
t2=<VALUE>
{ System.out.println("The Variable is "+t.image+",and the Value is "+t2.image); }
}
I have already tried to replace the "System.in" at the parser constructor with an object of type File.And then read the file by line, but it did not work.
Pass a Reader to the parser's constructor.

Define token to match any string

I am new to javacc. I am trying to define a token which can match any string. I am following the regex syntax <ANY: (~[])+> which is not working. I want to achieve something very simple, define an expression having the following BNF:
<exp> ::= "path(" <string> "," <number> ")"
My current .jj file is as follows, any help on how I can parse the string:
options
{
}
PARSER_BEGIN(SimpleAdd)
package SimpleAddTest;
public class SimpleAdd
{
}
PARSER_END(SimpleAdd)
SKIP :
{
" "
| "\r"
| "\t"
| "\n"
}
TOKEN:
{
< NUMBER: (["0"-"9"])+ > |
<PATH: "path"> |
<RPAR: "("> |
<LPAR: ")"> |
<QUOTE: "'"> |
<COMMA: ","> |
<ANY: (~[])+>
}
int expr():
{
String leftValue ;
int rightValue ;
}
{
<PATH> <RPAR> <QUOTE> leftValue = str() <QUOTE> <COMMA> rightValue = num() <LPAR>
{ return 0; }
}
String str():
{
Token t;
}
{
t = <ANY> { return t.toString(); }
}
int num():
{
Token t;
}
{
t = <NUMBER> { return Integer.parseInt(t.toString()); }
}
The error I am getting with the above javacc file is:
Exception in thread "main" SimpleAddTest.ParseException: Encountered " <ANY> "path(\'5\',1) "" at line 1, column 1.
Was expecting:
"path" ...
The pattern <ANY: (~[])+> will indeed match any nonempty string. The issue is that this is not what you really want. If you have a rule <ANY: (~[])+>, it will match the whole file, unless the file is empty. In most cases, because of the longest match rule, the whole file will be parsed as [ANY, EOF]. Is that really what you want? Probably not.
So I'm going to guess at what you really want. I'll guess you want any string that doesn't include a double quote character. Maybe there are other restrictions, such as no nonprinting characters. Maybe you want to allow double quotes if the are preceded by a backslash. Who knows? Adjust as needed.
Here is what you can do. First, replace the token definitions with
TOKEN:
{
< NUMBER: (["0"-"9"])+ > |
<PATH: "path"> |
<RPAR: "("> |
<LPAR: ")"> |
<COMMA: ","> |
<STRING: "\"" (~["\""])* "\"" >
}
Then change your grammar to
int expr():
{
String leftValue ;
int rightValue ;
}
{
<PATH> <RPAR> leftValue=str() <COMMA> rightValue = num() <LPAR>
{ return 0; }
}
String str():
{
Token t;
int len ;
}
{
t = <String>
{ len = t.image.length() ; }
{ return t.image.substring(1,len-1); }
}

JavaCC simple example not working

I am trying javacc for the first time with a simple naive example which is not working. My BNF is as follows:
<exp>:= <num>"+"<num>
<num>:= <digit> | <digit><num>
<digit>:= [0-9]
Based on this BNF, I am writing the SimpleAdd.jj as follows:
options
{
}
PARSER_BEGIN(SimpleAdd)
public class SimpleAdd
{
}
PARSER_END(SimpleAdd)
SKIP :
{
" "
| "\r"
| "\t"
| "\n"
}
TOKEN:
{
< NUMBER: (["0"-"9"])+ >
}
int expr():
{
int leftValue ;
int rightValue ;
}
{
leftValue = num()
"+"
rightValue = num()
{ return leftValue+rightValue; }
}
int num():
{
Token t;
}
{
t = <NUMBER> { return Integer.parseInt(t.toString()); }
}
using the above file, I am generating the java source classes. My main class is as follows:
public class Main {
public static void main(String [] args) throws ParseException {
SimpleAdd parser = new SimpleAdd(System.in);
int x = parser.expr();
System.out.println(x);
}
}
When I am entering the expression via System.in, I am getting the following error:
11+11^D
Exception in thread "main" SimpleAddTest.ParseException: Encountered "<EOF>" at line 0, column 0.
Was expecting:
<NUMBER> ...
at SimpleAddTest.SimpleAdd.generateParseException(SimpleAdd.java:200)
at SimpleAddTest.SimpleAdd.jj_consume_token(SimpleAdd.java:138)
at SimpleAddTest.SimpleAdd.num(SimpleAdd.java:16)
at SimpleAddTest.SimpleAdd.expr(SimpleAdd.java:7)
at SimpleAddTest.Main.main(Main.java:9)
Any hint to solve the problem ?
Edit Note that this answer answers an earlier version of the question.
When a BNF production uses a nonterminal that returns a result, you can record that result in a variable.
First declare the variables in the declaration part of the BNF production
int expr():
{
int leftValue ;
int rightValue ;
}
{
Second, in the main body of the production, record the results in the variables.
leftValue = num()
"+"
rightValue = num()
Finally, use the values of those variables to compute the result of this production.
{ return leftValue+rightValue; }
}

Semi-colon required, optional, or disallowed in gRPC option value?

I'm seeing one piece of code like the following:
rpc SayFallback (FooRequest) returns (FooResponse) {
option (com.example.proto.options.bar) = {
value : "{ message:\"baz\" }";
};
}
and another like the following:
rpc SayFallback (FooRequest) returns (FooResponse) {
option (com.example.proto.options.bar) = {
value : "{ message:\"baz\" }"
};
}
The first has a ; on the line with value while the second doesn't. Are either OK according to the standard?
Yes, they are considered optional. See the protobuf file source snippet:
while (!TryConsumeEndOfDeclaration("}", NULL)) {
if (AtEnd()) {
AddError("Reached end of input in method options (missing '}').");
return false;
}
if (TryConsumeEndOfDeclaration(";", NULL)) {
// empty statement; ignore
} else {
...
}

JavaCC IntegerLiteral

I am using JavaCC to build a lexer and a parser and I have the following code:
TOKEN:
{
< #DIGIT : [ "0"-"9" ] >
|< INTEGER_LITERAL : (<DIGIT>)+ >
}
SimpleNode IntegerLiteral() :
{
Token t;
}
{
(t=<INTEGER_LITERAL>)
{
Integer n = new Integer(t.image);
jjtThis.jjtSetValue( n );
return jjtThis;
}
}
Hence it should accept only integers but it is also accepting 4. or 4 %%%%%% etc.
Try turn on debugging in your parser spec file like:
OPTIONS {
DEBUG_TOKEN_MANAGER=true
}
This will create a printout of what the TokenManager is doing while parsing.
"4." and "4%%%%" are not really accepted because what is read by your parser is always "4"
if you set you DEBUG_PARSER = true; in the OPTION section you will see the currently read token.
I think if you change your grammar like this you can see that it throws a TokenMgrError when it reads the unhandled character
SimpleNode IntegerLiteral() :
{
Token t;
}
{
(
t=<DIGIT>
{
Integer n = new Integer(t.image);
jjtThis.jjtSetValue( n );
return jjtThis;
})+
}

Resources