Lexical error at line 0, column 0 - javacc

In the grammar below, I am trying configure any line that starts with ' as a single line comment and anything betweeen /' Multiline Comment '/. The single line comment works ok. But for some reason as soon as I press / or ' or ';' or < or '>' I get the error below. I don't have above characters configured. Shouldn't they be considered default and skip parsing ?
Error
Lexical error at line 0, column 0. Encountered: "\"" (34), after : ""
Lexical error at line 0, column 0. Encountered: ">" (62), after : ""
Lexical error at line 0, column 0. Encountered: "\n" (10), after : "-"
I have only included part of the code below for conciseness. For full Lexer definition please visit the link
TOKEN :
{
< WHITESPACE:
" "
| "\t"
| "\n"
| "\r"
| "\f">
}
/* COMMENTS */
MORE :
{
<"/'"> { input_stream.backup(1); } : IN_MULTI_LINE_COMMENT
}
<IN_MULTI_LINE_COMMENT>
TOKEN :
{
<MULTI_LINE_COMMENT: "'/" > : DEFAULT
}
<IN_MULTI_LINE_COMMENT>
MORE :
{
< ~[] >
}
TOKEN :
{
<SINGLE_LINE_COMMENT: "'" (~["\n", "\r"])* ("\n" | "\r" | "\r\n")?>
}

I can't reproduce every aspect of your problem. You say there is an error "as soon as" you enter certain characters. Here is what I get.
/ There is no error unless the next character is not a '. If the next character is not ', there is an error.
' I see no error. This is correctly treated as the start of comment
; There is always an error. No token can start with ;.
< There only an error if the next characters are not - or <-.
> There always is an error. No token can start with >
I'm not exactly sure why you would expect these not to be errors, since your lexer has no rules to cover these cases. Generally when there is no rule to match a prefix of the input and the input is not exhausted, there will be a TokenMgrError thrown.
If you want to eliminate all these TokenMgrErrors, make a catch-all rule (as explained in the FAQ):
TOKEN: { <UNEXPECTED_CHARACTER: ~[] > }
Make sure this is the very last rule in the .jj file. This rule says that, when no other rule applies, the next character is treated as an UNEXPECTED_CHARACTER token. Of course this just boots the problem up to the parsing level. If you really want the tokenizer to skip all characters that don't belong, just use the following rule as the very last rule:
SKIP : { < ~[] > }
For most languages, that would be an odd thing to do, which is why it is not the default.

Related

I need to parse a string using javacc containing single quotes as part of the string

I have defined grammar rules like
TOKEN : { < SINGLE_QUOTE : " ' " > }
TOKEN : { < STRING_LITERAL : " ' " (~["\n","\r"])* " ' ">
But I am not able to parse sequences like 're'd' .I need the parser to parse re'd as a string literal.But the parser parses 're' seperately and 'd' seperately for these rules.
If you need to lex re'd as STRING_LITERAL token then use the following rule
TOKEN : { < SINGLE_QUOTE : "'" > }
TOKEN : { < STRING_LITERAL : "'"? (~["\n","\r"])* "'"?>
I didn't see the rule for matching "re" separately.
In javacc, definition of your lexical specification STRING_LITERAL is to start with "'" single quot. But your input doesn't have the "'" at starting.
The "?" added in the STRING_LITERAL makes the single quot optional and if present only one. so this will match your input and lex as STRING_LITERAL.
JavaCC decision making rules:
1.) JavaCC will looks for the longest match.
Here in this case even if the input starts with the "'" the possible matches are SINGLE_QUOTE and STRING_LITERAL. the second input character tells which token to choose STRING_LITERAL.
2.) JavaCC takes the the rule declared first in the grammar.
Here if the input is only "'" then it will be lexed as SINGLE_QUOTE even if there is the possible two matches SINGLE_QUOTE and STRING_LITERAL.
Hope this will help you...
The following should work:
TOKEN : { < SINGLE_QUOTE : "'" > }
TOKEN : { < STRING_LITERAL : "'" (~["\n","\r"])* "'"> }
This is pretty much what you had, except that I removed some spaces.
Now if there are two on more apostrophes on a line (i.e. without an intervening newline or return) then the first and the last of those apostrophes together with all characters between should be lexed as one STRING_LITERAL token. That includes all intervening apostrophes. This is assuming there are no other rules involving apostrophes. For example, if your file is 're'd' that should lex as one token; likewise 'abc' + 'def' should lex as one token.

SKIP not working at the beginning of the line

having this
|83.56|
|as.63|
|as.lk|
|as45.34|
as imput in a *.txt file i need to skip the character "|" in the beginning but also at the end of the line, cause the output should be
<83.56> :only numbers
<as.63>
<as.lk> :only letters
<as45.34> :numers/letters together
i got this as my code declaration
whitout "|" at the beggining
and nothing appears as result, this is strange cause if i put the character "|" by this way the result its almost the expected one, its this
<|83.56> :only numbers
<|as.63>
<|as.lk> :only letters
<|as45.34> :numers/letters together
so the matter is that the "|" of the end of the line it´s being skipped propertly, but the one at the beginning don´t
note: I have also declared at the beginning numeros and letras_minusculas, by this way
TOKEN:{<Numeros:["0"-"9"]>}
TOKEN:{<Letras_minusculas:["a"-"z"]>}
JavaCC has provided a skip section where the un-necessary characters are skipped.
Please find the example skip block below.
SKIP : {
"|"
}
TOKEN : {
<NUMERIC: ["0"-"9","."]+ >
<ALPHA : ["a"-"b","."]+ >
<ALPHA-NUM : ((NUMERIC>|<ALPHA>))+ >
}
sample inputs:
|83.56| --> NUMERIC without "|"
|as.63| --> ALPHA-NUM without "|"
|as.lk| --> ALPHA without "|"
|as45.34| --> ALPHA-NUM without "|"
note the skip will not skip the characters when any possible match already started.
Eg : |83|3.56|
The character "83" will be started to match NUMERIC and ALPHA-NUM so the next character cannot be skipped. Here in our case causes the error, Because all the possible match is not accommodating the "|" symbol.
If we changed the rule like below.
SKIP : {
"|"
}
TOKEN : {
<NUMERIC: ["0"-"9","."]+ >
<ALPHA : ["a"-"b","."]+ >
<ALPHA-NUM : ((NUMERIC>|<ALPHA>|"|"))+ >
}
Then the input matches to ALPHA-NUM including "|" i.e.: 83|3.56

parsing variable composed with lettre and numbers like " JAVAC 1.7.0.XXX"

I'm trying to parse regular expressions using JavaCC but I encountered a problem with variable " Y " composed of lettre and number for exemple : " JAVA 1.7.1.XXX" . knowing that I have already defined the Token
<id > = < lettre > | <number> < #lettre : [ "A"-"Z", "a"-"z"]> | < #number : [ "0"-"9" ] > in execution, the parser processing the first part of the variable " Y " like as <id>. after the parsing is stops. Thanks in advance.
Edit.
Here code parseur.jj:
TOKEN : { <ID2 : (["a"-"z","A"-"Z","0"-"9","_"])+
( (["0"-"9"])+ "." (["0"-"9"])+ "." (["0"-"9"])+)+
(["a"-"z","A"-"Z","_","."])+ >}
TOKEN : { <ID : ["a"-"z","A"-"Z","_"] (["a"-"z","A"-"Z","0"-"9","_"])* >}
Suppose the remaining input steams starts with this : MyFile1_Test 1.2.3.txt
then the token <ID> is attributed ?
and not <ID2>. normaly, why this rules not appilcatble : If more than one regular expression describes a prefix, then a regular expression that describes the longest prefix of the input stream is used. (This
is called the “maximal munch rule”.) thank you very much for your help
Here is the parseur.jj code:
TOKEN : { <ID2 : (["a"-"z","A"-"Z","0"-"9","_"])+ ( (["0"-"9"])+ "." (["0"-"9"])+ "." (["0"-"9"])+)+ (["a"-"z","A"-"Z","_","."])+ >}
TOKEN : { <ID : ["a"-"z","A"-"Z","_"] (["a"-"z","A"-"Z","0"-"9","_"])* >}
Suppose the remaining input steams starts with: MyFile1_Test 1.2.3.txt
then the token <ID> is attributed and not <ID2>. Normaly, why this rules not applicable:
If more than one regular expression describes a prefix, then a regular expression that describes the longest prefix of the input stream is used. (This is called the “maximal munch rule”.)

Escape character for " in ValidationExpression in ASP.NET

I am using regular expression to filter the invalid input entered by the end user.
The acceptable input is word, space, digital and . / # , # & $ _ : ? ' % ! – ~ " | + ; ” { } - \.
Below is my code.
<asp:RegularExpressionValidator ID="rgVEditTB1" runat="server" ControlToValidate="txtEditTB1"
ValidationExpression="^[\w\s\d\-\.\/\#\,\#\&\$\:\?\"\'\%\!\–\~\|\+\;\”\{\}\-\\]+$" ErrorMessage="Invalid Special Character" />
However, I am encountering problem to escape " in the ValidataionExpression, it errors out with
Server Tag is not well formed error.
I tried to change the escape character to:
\""
\"
""
It also gives me the same error.
What should be the correct escape character to put in the ValidationExpression?
You should be able to pass in the HTML encoding values. So, passing " would be like passing ". Something like this: ValidationExpression="^[^"]+$". In this regex I am saying: Match any character from the beginning till the end of the string which is not a quotation mark (").
The same applies to the other special symbols. You can take a look here for more encoding values.

unexpected ')', which I can not figure out

I am getting the following error. I can not figure out what is missing, as I seem to have all my brackets matched up.
Error: unexpected ')' in:
"{
if (grepl(propertiesData[x,'city'],population[z,'NAME'],ignore.case=TRUE) & (propertiesData[x,'stateLong']==population[z,'STATENAME')"
Here is the code of the loop:
for (z in c(1:nrow(population)))
{
if (grepl(propertiesData[x,'city'],population[z,'NAME'],ignore.case=TRUE) & (propertiesData[x,'stateLong']==population[z,'STATENAME'))
{
propertiesData[x,'population']=population[z,'POP_2009']
break
}
}
==population[z,'STATENAME'))
Seems like you forgot the closing bracket. Add it in and see what happens:
==population[z,'STATENAME']))
You're missing one ] at the end of line.
...==population[z,'STATENAME'] ))
You are missing "]" at the end : (propertiesData[x,'stateLong']==population[z,'STATENAME']))

Resources