Are shorthand character classes (such as \d) not supported in JavaCC - javacc

I am trying to learn to use JavaCC and realized that it has support for regular expressions. Call me lazy but I thought the default/common way to define digits is a bit too long:
TOKEN : { < #DIGITS : (["0" - "9"])+ >}
I tried using the shorthand character classes such as:
TOKEN : { < #DIGITS : (\d)+ >}
but the "compiler compiler" doesn't seem to like it. I get Lexical errors for the shorthand character. I could not find any documentation on the matter so I am not sure if I am doing something wrong or that it's simply not supported. If anyone can confirm/deny my assumption, that javacc not playing well with the shorthand character classes, I would be very appreciative.

Your finding that it's not supported is correct. Regular expressions in JavaCC are made up only of string literals, references to other regular expressions, and references to the predefined regular expression < EOF >.
However, what you are doing with the code you have there is creating your own shortcut. The number sign means that the symbol is private, i.e., can be used only inside regular expressions. So, defining it as TOKEN : { < #D : (["0" - "9"])+ > } means you could then use < D > within other token definitions.
The example grammar javacc.jj, included with the binary distribution, is the official grammar, so looking in this file you can see just what is parsable by this grammar. The output seems to be a essentially a grammar validator.

Related

How to access map in template string?

I want to use values from gradle.properties that should go into a template string.
A naive first:
println("${project.properties[somekey]}")
doesn't work: Unresolved reference: somekey
So, quotes required?
println("${project.properties[\"somekey\"]}")
is completely broken syntax: Expecting an expression for the first .
I couldn't find any example how to do this, yet the official documentation says expressions.
Question: is it possible to access a map in string template, and if so, how?
Yes and as follows:
"${project.properties["someKey"]}"
assuming the Map has the following signature: Map<String, Any?> (or Map<Any...)
Alternatives:
"${project.properties.getValue("someKey")}"
"${project.properties.getOrElse("someKey") { "lazy-evaluation-default-value" }}"
"${project.properties.getOrDefault("someKey", "someFixedDefaultValue")}"
Basically all the code you put in the ${} is just plain Kotlin code... no further quoting/escaping required, except for the dollar sign $ itself, e.g. use "\$test" if you do not want it to be substituted with a variable named test or """${"$"}test""" if you use a raw string
Note that in this println case the following would have sufficed as well (which also goes for all the shown alternatives above. You may omit the outer surrounding quotes and ${} altogether):
println(project.properties["someKey"])
See also Basic types - String templates

Does R 4.0.0. make it possible to define foo"(...)" operators, similar to the newly introduced r"(...)" syntax?

R 4.0.0 brings in a new syntax for raw strings:
r"(raw string here can contain anything except the closing sequence)"
But this same construct in R 3.x.x produced a syntax error:
Error: unexpected string constant in "r"(asdasd)""
Does it mean that the interpreter was changed in R 4.0.0. ?
And if so - does R 4.0.0. provide a mechanism to define custom functions like foo"()" ?
No, that's not possible at the moment (nor would I anticipate it becoming possible anytime soon).
Here's the NEWS item:
There is a new syntax for specifying raw character constants similar to the one used in C++: r"(...)" with ... any character sequence not containing the sequence )". This makes it easier to write strings that contain backslashes or both single and double quotes. For more details see ?Quotes.
https://cran.r-project.org/doc/manuals/r-devel/NEWS.html
Then from ?Quotes:
Raw character constants are also available using a syntax similar to
the one used in C++: r"(...)" with ... any character
sequence, except that it must not contain the closing sequence
)". The delimiter pairs [] and {} can also be
used, and R can be used in place of r. For additional
flexibility, a number of dashes can be placed between the opening quote
and the opening delimiter, as long as the same number of dashes appear
between the closing delimiter and the closing quote.
https://github.com/wch/r-source/blob/trunk/src/library/base/man/Quotes.Rd
Here's the (git mirror of the SVN patch of the) commit where this functionality was added:
https://github.com/wch/r-source/commit/8b0e58041120ddd56cd3bb0442ebc00a3ab67ebc

How to process latex commands in R?

I work with knitr() and I wish to transform inline Latex commands like "\label" and "\ref", depending on the output target (Latex or HTML).
In order to do that, I need to (programmatically) generate valid R strings that correctly represent the backslash: for example "\label" should become "\\label". The goal would be to replace all backslashes in a text fragment with double-backslashes.
but it seems that I cannot even read these strings, let alone process them: if I define:
okstr <- function(str) "do something"
then when I call
okstr("\label")
I directly get an error "unrecognized escape sequence"
(of course, as \l is faultly)
So my question is : does anybody know a way to read strings (in R), without using the escaping mechanism ?
Yes, I know I could do it manually, but that's the point: I need to do it programmatically.
There are many questions that are close to this one, and I have spent some time browsing, but I have found none that yields a workable solution for this.
Best regards.
Inside R code, you need to adhere to R’s syntactic conventions. And since \ in strings is used as an escape character, it needs to form a valid escape sequence (and \l isn’t a valid escape sequence in R).
There is simply no way around this.
But if you are reading the string from elsewhere, e.g. using readLines, scan or any of the other file reading functions, you are already getting the correct string, and no handling is necessary.
Alternatively, if you absolutely want to write LaTeX-like commands in literal strings inside R, just use a different character for \; for instance, +. Just make sure that your function correctly handles it everywhere, and that you keep a way of getting a literal + back. Here’s a suggestion:
okstr("+label{1 ++ 2}")
The implementation of okstr then needs to replace single + by \, and double ++ by + (making the above result in \label{1 + 2}). But consider in which order this needs to happen, and how you’d like to treat more complex cases; for instance, what should the following yield: okstr("1 +++label")?

What is an example usage of <url-modifier> at CSS url() function?

3.4. Resource Locators: the <url> type describes a <url-modifier> at
A URL is a pointer to a resource and is a functional notation
denoted by <url>. The syntax of a <url> is:
<url> = url( <string> <url-modifier>* )
In addition to the syntax defined above, a can sometimes be
written in other ways:
For legacy reasons, a <url> can be written without quotation marks around the URL itself. This syntax is specially-parsed, and
produces a <url-token> rather than a function syntactically.
[CSS3SYN]
Some CSS contexts, such as #import, allow a <url> to be represented by a <string> instead. This behaves identically to
writing a url() function containing that string. Because these
alternate ways of writing a <url> are not functional notations, they
cannot accept any <url-modifier>s.
Note: The special parsing rules for the legacy quotation mark-less
<url> syntax means that parentheses, whitespace characters, single
quotes (') and double quotes (") appearing in a URL must be escaped
with a backslash, e.g. url(open\(parens), url(close\)parens).
Depending on the type of URL, it might also be possible to write these
characters as URL-escapes (e.g. url(open%28parens) or
url(close%29parens)) as described in[URL]. (If written as a
normal function containing a string, ordinary string escaping rules
apply; only newlines and the character used to quote the string need
to be escaped.)
at
3.4.2. URL Modifiers
The url() function supports specifying additional <url-modifier>s,
which change the meaning or the interpretation of the URL somehow. A
<url-modifier> is either an <ident> or a function.
This specification does not define any <url-modifier>s, but other
specs may do so.
See also CSS Values and Units Module Level 3
Editor’s Draft, 21 March 2016
What are example usages of <ident> and function at url() ?
What are differences between <string> , <ident>, function at url() ?
A <url-modifier> is either an <ident> or a function.
<ident> is an identifier.
A portion of the CSS source that has the same syntax as an <ident-token>.
<ident-token> Syntax ;
I could not find any examples of <ident> used within the url function but
as mentioned in this email there are some possible future uses.
Fetch options to control CORS/cookies/etc
working with Subresource Integrity
Looking at the <ident> syntax you cannot use a key/value pair so i assume
most of this would be implemented using a function which does not yet exist., resource hinting could be implemented using <ident>.
.foo {
background-image: url("//aa.com/img.svg" prefetch);
}
I did however find a "A Collection of Interesting Ideas" with a function <url-modifier> defined.
SVG Parameters (not official spec)
The params() function is a <url-modifier>
.foo {
background-image: url("//aa.com/img.svg" param(--color var(--primary-color)));
}

Avoiding left recursion in parsing LiveScript object definitions

I'm working on a parser for LiveScript language, and am having trouble with parsing both object property definition forms — key: value and (+|-)key — together. For example:
prop: "val"
+boolProp
-boolProp
prop2: val2
I have the key: value form working with this:
Expression ::= TestExpression
| ParenExpression
| OpExpression
| ObjDefExpression
| PropDefExpression
| LiteralExpression
| ReferenceExpression
PropDefExpression ::= Expression COLON Expression
ObjDefExpression ::= PropDefExpression (NEWLINE PropDefExpression)*
// ... other expressions
But however I try to add ("+"|"-") IDENTIFIER to PropDefExpression or ObjDefExpression, I get errors about using left recursion. What's the (right) way to do this?
The grammar fragment you posted is already left-recursive, i.e. without even adding (+|-)boolprop, the non-terminal 'Expression' derives a form in which 'Expression' reappears as the leftmost symbol:
Expression -> PropDefExpression -> Expression COLON Expression
And it's not just left-recursive, it's ambiguous. E.g.
Expression COLON Expression COLON Expression
can be derived in two different ways (roughly, left-associative vs right-associative).
You can eliminate both these problems by using something more restricted on the left of the colon, e.g.:
PropDefExpression ::= Identifier COLON Expression
Also, another ambiguity: Expression derives PropDefExpression in two different ways, directly and via ObjDefExpression. My guess is, you can drop the direct derivation.
Once you've taken care of those things, it seems to me you should be able to add (+|-)boolprop without errors (unless it conflicts with one of the other kinds of expression that you didn't show).
Mind you, looking at the examples at http://livescript.net, I'm doubtful how much of that you'll be able to capture in a conventional grammar. But if you're just going for a subset, you might be okay.
I don't know how much help this will be, because I know nothing about GrammarKit and not much more about the language you're trying to parse.
However, it seems to me that
PropDefExpression ::= Expression COLON Expression
is not quite accurate, and it is creating an ambiguity when you add the boolean property production because an Expression might start with a unary - operator. In the actual grammar, though, a property cannot start with an arbitrary Expression. There are two types of key-property definitions:
name : expression
parenthesized_expression : expression
(Which is to say, expressions need to start with a ().
That means that a boolean property definition, starting with + or - is recognizable from the first token, which is precisely the condition needed for successful recursive descent parsing. There are several other property definition syntaxes, including names and parenthesized_expressions not followed by a :
That's easy to parse with an LR(1) parser, like the one Jison produces, but to parse it with a recursive-descent parser you need to left-factor. (It's possible that GrammarKit can do this for you, by the way.) Basically, you'd need something like (this is not complete):
PropertyDefinition ::= PropertyPrefix PropertySuffix? | BooleanProperty
PropertyPrefix ::= NAME | ParenthesizedExpression
PropertySuffix ::= COLON Expression | DOT NAME

Resources