I need to parse a string using javacc containing single quotes as part of the string - javacc

I have defined grammar rules like
TOKEN : { < SINGLE_QUOTE : " ' " > }
TOKEN : { < STRING_LITERAL : " ' " (~["\n","\r"])* " ' ">
But I am not able to parse sequences like 're'd' .I need the parser to parse re'd as a string literal.But the parser parses 're' seperately and 'd' seperately for these rules.

If you need to lex re'd as STRING_LITERAL token then use the following rule
TOKEN : { < SINGLE_QUOTE : "'" > }
TOKEN : { < STRING_LITERAL : "'"? (~["\n","\r"])* "'"?>
I didn't see the rule for matching "re" separately.
In javacc, definition of your lexical specification STRING_LITERAL is to start with "'" single quot. But your input doesn't have the "'" at starting.
The "?" added in the STRING_LITERAL makes the single quot optional and if present only one. so this will match your input and lex as STRING_LITERAL.
JavaCC decision making rules:
1.) JavaCC will looks for the longest match.
Here in this case even if the input starts with the "'" the possible matches are SINGLE_QUOTE and STRING_LITERAL. the second input character tells which token to choose STRING_LITERAL.
2.) JavaCC takes the the rule declared first in the grammar.
Here if the input is only "'" then it will be lexed as SINGLE_QUOTE even if there is the possible two matches SINGLE_QUOTE and STRING_LITERAL.
Hope this will help you...

The following should work:
TOKEN : { < SINGLE_QUOTE : "'" > }
TOKEN : { < STRING_LITERAL : "'" (~["\n","\r"])* "'"> }
This is pretty much what you had, except that I removed some spaces.
Now if there are two on more apostrophes on a line (i.e. without an intervening newline or return) then the first and the last of those apostrophes together with all characters between should be lexed as one STRING_LITERAL token. That includes all intervening apostrophes. This is assuming there are no other rules involving apostrophes. For example, if your file is 're'd' that should lex as one token; likewise 'abc' + 'def' should lex as one token.

Related

How to exclude character " in a token JavaCC

Hello i´m working with JavaCC and I am writing a token that put one String between " ". Context:
void literalString(): {} { """ (characteresString())? """ }
void characteresString(): {} { <characterString> | characteresString() <characterString> }
So i made this token to put one String:
TOKEN : {<characterString : ~["\", "] >}
The problem is I don´t know how to exclude the " symbol in the token, if I put """ it gives me error, if i put one " error again.
Thank you in advance
Instead of
void literalString(): {} { """ (characteresString())? """ }
use a token definition
TOKEN : { <STRING : "\"" (<CHAR>)* "\"" >
| <#CHAR : ~["\""] > // Any character that is not "
}
Now this defines a string to be a ", followed by zero or more characters that are not "s, followed by another ".
However some languages have further restrictions, such as only allowing characters in a certain range. For example if only printable ascii characters excluding "s where allowed, then you would use
TOKEN : { <STRING : "\"" (<CHAR>)* "\"" >
| <#CHAR: [" ","!","#"-"~"]> // Printable ASCII characters excluding "
}
But, say you want to allow " characters if the are preceded a by \ and you want to ban \ characters unless they are followed by a " or another \ or an n. Then you could use
TOKEN : { <STRING : "\"" (<CHAR> | <ESCAPESEQ>)* "\"" >
| <#CHAR: [" ","!","#"-"[","]"-"~"] > // Printable ASCII characters excluding \ and "
| <#ESCAPESEQ: "\\" ["\"","\\","n"] > // 2-character sequences \\, \", and \n
}

parsing variable composed with lettre and numbers like " JAVAC 1.7.0.XXX"

I'm trying to parse regular expressions using JavaCC but I encountered a problem with variable " Y " composed of lettre and number for exemple : " JAVA 1.7.1.XXX" . knowing that I have already defined the Token
<id > = < lettre > | <number> < #lettre : [ "A"-"Z", "a"-"z"]> | < #number : [ "0"-"9" ] > in execution, the parser processing the first part of the variable " Y " like as <id>. after the parsing is stops. Thanks in advance.
Edit.
Here code parseur.jj:
TOKEN : { <ID2 : (["a"-"z","A"-"Z","0"-"9","_"])+
( (["0"-"9"])+ "." (["0"-"9"])+ "." (["0"-"9"])+)+
(["a"-"z","A"-"Z","_","."])+ >}
TOKEN : { <ID : ["a"-"z","A"-"Z","_"] (["a"-"z","A"-"Z","0"-"9","_"])* >}
Suppose the remaining input steams starts with this : MyFile1_Test 1.2.3.txt
then the token <ID> is attributed ?
and not <ID2>. normaly, why this rules not appilcatble : If more than one regular expression describes a prefix, then a regular expression that describes the longest prefix of the input stream is used. (This
is called the “maximal munch rule”.) thank you very much for your help
Here is the parseur.jj code:
TOKEN : { <ID2 : (["a"-"z","A"-"Z","0"-"9","_"])+ ( (["0"-"9"])+ "." (["0"-"9"])+ "." (["0"-"9"])+)+ (["a"-"z","A"-"Z","_","."])+ >}
TOKEN : { <ID : ["a"-"z","A"-"Z","_"] (["a"-"z","A"-"Z","0"-"9","_"])* >}
Suppose the remaining input steams starts with: MyFile1_Test 1.2.3.txt
then the token <ID> is attributed and not <ID2>. Normaly, why this rules not applicable:
If more than one regular expression describes a prefix, then a regular expression that describes the longest prefix of the input stream is used. (This is called the “maximal munch rule”.)

Lexical error at line 0, column 0

In the grammar below, I am trying configure any line that starts with ' as a single line comment and anything betweeen /' Multiline Comment '/. The single line comment works ok. But for some reason as soon as I press / or ' or ';' or < or '>' I get the error below. I don't have above characters configured. Shouldn't they be considered default and skip parsing ?
Error
Lexical error at line 0, column 0. Encountered: "\"" (34), after : ""
Lexical error at line 0, column 0. Encountered: ">" (62), after : ""
Lexical error at line 0, column 0. Encountered: "\n" (10), after : "-"
I have only included part of the code below for conciseness. For full Lexer definition please visit the link
TOKEN :
{
< WHITESPACE:
" "
| "\t"
| "\n"
| "\r"
| "\f">
}
/* COMMENTS */
MORE :
{
<"/'"> { input_stream.backup(1); } : IN_MULTI_LINE_COMMENT
}
<IN_MULTI_LINE_COMMENT>
TOKEN :
{
<MULTI_LINE_COMMENT: "'/" > : DEFAULT
}
<IN_MULTI_LINE_COMMENT>
MORE :
{
< ~[] >
}
TOKEN :
{
<SINGLE_LINE_COMMENT: "'" (~["\n", "\r"])* ("\n" | "\r" | "\r\n")?>
}
I can't reproduce every aspect of your problem. You say there is an error "as soon as" you enter certain characters. Here is what I get.
/ There is no error unless the next character is not a '. If the next character is not ', there is an error.
' I see no error. This is correctly treated as the start of comment
; There is always an error. No token can start with ;.
< There only an error if the next characters are not - or <-.
> There always is an error. No token can start with >
I'm not exactly sure why you would expect these not to be errors, since your lexer has no rules to cover these cases. Generally when there is no rule to match a prefix of the input and the input is not exhausted, there will be a TokenMgrError thrown.
If you want to eliminate all these TokenMgrErrors, make a catch-all rule (as explained in the FAQ):
TOKEN: { <UNEXPECTED_CHARACTER: ~[] > }
Make sure this is the very last rule in the .jj file. This rule says that, when no other rule applies, the next character is treated as an UNEXPECTED_CHARACTER token. Of course this just boots the problem up to the parsing level. If you really want the tokenizer to skip all characters that don't belong, just use the following rule as the very last rule:
SKIP : { < ~[] > }
For most languages, that would be an odd thing to do, which is why it is not the default.

Escape character for " in ValidationExpression in ASP.NET

I am using regular expression to filter the invalid input entered by the end user.
The acceptable input is word, space, digital and . / # , # & $ _ : ? ' % ! – ~ " | + ; ” { } - \.
Below is my code.
<asp:RegularExpressionValidator ID="rgVEditTB1" runat="server" ControlToValidate="txtEditTB1"
ValidationExpression="^[\w\s\d\-\.\/\#\,\#\&\$\:\?\"\'\%\!\–\~\|\+\;\”\{\}\-\\]+$" ErrorMessage="Invalid Special Character" />
However, I am encountering problem to escape " in the ValidataionExpression, it errors out with
Server Tag is not well formed error.
I tried to change the escape character to:
\""
\"
""
It also gives me the same error.
What should be the correct escape character to put in the ValidationExpression?
You should be able to pass in the HTML encoding values. So, passing " would be like passing ". Something like this: ValidationExpression="^[^"]+$". In this regex I am saying: Match any character from the beginning till the end of the string which is not a quotation mark (").
The same applies to the other special symbols. You can take a look here for more encoding values.

escaping string for json result in asp.net server side operation

I have a server side operation manually generating some json response. Within the json is a property that contains a string value.
What is the easiest way to escape the string value contained within this json result?
So this
string result = "{ \"propName\" : '" + (" *** \\\"Hello World!\\\" ***") + "' }";
would turn into
string result = "{ \"propName\" : '" + SomeJsonConverter.EscapeString(" *** \\\"Hello World!\\\" ***") + "' }";
and result in the following json
{ \"propName\" : '*** \"Hello World!\" ***' }
First of all I find the idea to implement serialization manually not good. You should to do this mostla only for studying purpose or of you have other very important reason why you can not use standard .NET classes (for example use have to use .NET 1.0-3.0 and not higher).
Now back to your code. The results which you produce currently are not in JSON format. You should place the property name and property value in double quotas:
{ "propName" : "*** \"Hello World!\" ***" }
How you can read on http://www.json.org/ the double quota in not only character which must be escaped. The backslash character also must be escaped. You cen verify you JSON results on http://www.jsonlint.com/.
If you implement deserialization also manually you should know that there are more characters which can be escaped abbitionally to \" and \\: \/, \b, \f, \n, \r, \t and \u which follows to 4 hexadecimal digits.
How I wrote at the beginning of my answer, it is better to use standard .NET classes like DataContractJsonSerializer or JavaScriptSerializer. If you have to use .NET 2.0 and not higher you can use Json.NET.
You may try something like:
string.replace(/(\\|")/g, "\\$1").replace("\n", "\\n").replace("\r", "\\r");

Resources