Are regexs allowed in BNF and EBNF notations? - bnf

If I wanted for example to define the Lisp programming language, where a name can include even non-alphanumeric characters, should I list all the usable characters with a notation like:
validchar ::= "a" | "b" | "c" ... "-" | "*" | "$" ... ;
name = validchar, (validchar | digit)+;
Or am I allowed to use regexs, like:
validchar ::= "[^(^)^\s^\d]";
name ::= validchar, (validchar | digit)*;
Or even:
name ::= "[^(^)^\s^\d]", "[^(^)^\s]"*;
This would shorten it a lot, and it would include even characters like ₩, ¥, € and so on, which I can't list but are actually usable.

Whether this is allowed depends on the tool you are using that implements the (E)BNF notation.
Some tools are rather strict and stick to the original definition of (E)BNF, allowing at best Kleene * or + on language tokens. An additional point is that there is no requirement for classic (E)BNF to operate on characters as terminals.
Clearly it is convenient to be able to define some language tokens directly in terms of characters, and one can imagine (as you have) an EBNF in which one can write not only characters as terminals, but also regexes over characters.
Whether the tool you propose to use allows that... depends entirely on the tool. Many tools that process (E)BNF such as YACC are actually designed to work in conjunction with another tool, a "lexer generator" (for YACC, this is called FLEX) that defines character sequences for tokens. With such tool pairs, the (E)BNF tool typically does not allow any mention of characters or regexes over them, but the lexer generator tool explicitly does allow character and regex specifications for tokens.
There are hundreds of (E)BNF and lexer generator tools, each with somewhat (egregiously different) rules. Check the tool documentation.
Or write it the way you want to write it, and build your own (101st) tool.

Related

How to process latex commands in R?

I work with knitr() and I wish to transform inline Latex commands like "\label" and "\ref", depending on the output target (Latex or HTML).
In order to do that, I need to (programmatically) generate valid R strings that correctly represent the backslash: for example "\label" should become "\\label". The goal would be to replace all backslashes in a text fragment with double-backslashes.
but it seems that I cannot even read these strings, let alone process them: if I define:
okstr <- function(str) "do something"
then when I call
okstr("\label")
I directly get an error "unrecognized escape sequence"
(of course, as \l is faultly)
So my question is : does anybody know a way to read strings (in R), without using the escaping mechanism ?
Yes, I know I could do it manually, but that's the point: I need to do it programmatically.
There are many questions that are close to this one, and I have spent some time browsing, but I have found none that yields a workable solution for this.
Best regards.
Inside R code, you need to adhere to R’s syntactic conventions. And since \ in strings is used as an escape character, it needs to form a valid escape sequence (and \l isn’t a valid escape sequence in R).
There is simply no way around this.
But if you are reading the string from elsewhere, e.g. using readLines, scan or any of the other file reading functions, you are already getting the correct string, and no handling is necessary.
Alternatively, if you absolutely want to write LaTeX-like commands in literal strings inside R, just use a different character for \; for instance, +. Just make sure that your function correctly handles it everywhere, and that you keep a way of getting a literal + back. Here’s a suggestion:
okstr("+label{1 ++ 2}")
The implementation of okstr then needs to replace single + by \, and double ++ by + (making the above result in \label{1 + 2}). But consider in which order this needs to happen, and how you’d like to treat more complex cases; for instance, what should the following yield: okstr("1 +++label")?

Formatting a map[] in golang

I have a list of hosts inbound in the form of one string separated by commas.
EXAMPLE: "host01,host02,host03,"
I have this line that was an array of strings but I need it to be a map[string]interface{}
Here is what it is how do I make it a map[string]interface{}?
• Removing the trailing or any trailing comma.
hosts := []string{strings.TrimSuffix(hostlist, ",")}
• Later I split them on the comma like this.
hosts = strings.split(hosts[0], ",")
I just need to make it so names are keys and the values are unknown from APIs so an interface{}.
Thanks and forgive me I know this is super simple I am just not seeing it.
Loop over your slice of strings. Set each map entry to nil.
There is no fancy syntax like Python's list comprehensions or Perl's freaky group assignments.
And remember that StackOverflow's tag info is often really useful. See: https://stackoverflow.com/tags/go/info
And from there to the language specification. One bit that will help is https://golang.org/ref/spec#For_range if you aren't familiar with Go's for syntax to loop over slices.

Textpad: How to serialize / concat multiple replacements

I have to use Textpad in my environment. To treat a file (on a regular basis) it is necessary to make +/- 20 replacements, some of them regex, some of them not. For most of the replacements I have defined macros (for each replacement one macro, i. e. 1:1). It is possible to "concat" macros or put replacements "in a sequence"? If it is possible: Would this sequence break, if one replacement does not find matching patters (off course, it should not break).
I'm not sure how you would "concat" them aside from recording each macro together (unless you know how to concat the files)... but as your question is about "would it work"... then I'd say yes but you would have to ensure each marco started in the right place.
I'd recommend each macro started and ended with something like Ctrl+Home to ensure a consistent starting / ending place

Avoiding left recursion in parsing LiveScript object definitions

I'm working on a parser for LiveScript language, and am having trouble with parsing both object property definition forms — key: value and (+|-)key — together. For example:
prop: "val"
+boolProp
-boolProp
prop2: val2
I have the key: value form working with this:
Expression ::= TestExpression
| ParenExpression
| OpExpression
| ObjDefExpression
| PropDefExpression
| LiteralExpression
| ReferenceExpression
PropDefExpression ::= Expression COLON Expression
ObjDefExpression ::= PropDefExpression (NEWLINE PropDefExpression)*
// ... other expressions
But however I try to add ("+"|"-") IDENTIFIER to PropDefExpression or ObjDefExpression, I get errors about using left recursion. What's the (right) way to do this?
The grammar fragment you posted is already left-recursive, i.e. without even adding (+|-)boolprop, the non-terminal 'Expression' derives a form in which 'Expression' reappears as the leftmost symbol:
Expression -> PropDefExpression -> Expression COLON Expression
And it's not just left-recursive, it's ambiguous. E.g.
Expression COLON Expression COLON Expression
can be derived in two different ways (roughly, left-associative vs right-associative).
You can eliminate both these problems by using something more restricted on the left of the colon, e.g.:
PropDefExpression ::= Identifier COLON Expression
Also, another ambiguity: Expression derives PropDefExpression in two different ways, directly and via ObjDefExpression. My guess is, you can drop the direct derivation.
Once you've taken care of those things, it seems to me you should be able to add (+|-)boolprop without errors (unless it conflicts with one of the other kinds of expression that you didn't show).
Mind you, looking at the examples at http://livescript.net, I'm doubtful how much of that you'll be able to capture in a conventional grammar. But if you're just going for a subset, you might be okay.
I don't know how much help this will be, because I know nothing about GrammarKit and not much more about the language you're trying to parse.
However, it seems to me that
PropDefExpression ::= Expression COLON Expression
is not quite accurate, and it is creating an ambiguity when you add the boolean property production because an Expression might start with a unary - operator. In the actual grammar, though, a property cannot start with an arbitrary Expression. There are two types of key-property definitions:
name : expression
parenthesized_expression : expression
(Which is to say, expressions need to start with a ().
That means that a boolean property definition, starting with + or - is recognizable from the first token, which is precisely the condition needed for successful recursive descent parsing. There are several other property definition syntaxes, including names and parenthesized_expressions not followed by a :
That's easy to parse with an LR(1) parser, like the one Jison produces, but to parse it with a recursive-descent parser you need to left-factor. (It's possible that GrammarKit can do this for you, by the way.) Basically, you'd need something like (this is not complete):
PropertyDefinition ::= PropertyPrefix PropertySuffix? | BooleanProperty
PropertyPrefix ::= NAME | ParenthesizedExpression
PropertySuffix ::= COLON Expression | DOT NAME

How to set *readtable* to an empty one in common-lisp?

Standard common-lisp defines many reader macros such as ( and ) for grouping, ' for quote, " for string quotation, | for symbol quotation, # for dispatch macro, etc. Now I want to disable them all and use my own ones, and I have to call set-macro-character one by one to disable them all and then define my own ones.
I have found that there's one way to restore all reader macros to standard ones by calling (setf *readtable* (copy-readtable nil)), but is there a way to set them to empty(i.e., all the characters are treated as normal letters and numbers)?
I don't think there's a way. The expectation is that you're just making incremental modifications to the Lisp reader, not trying to replace it wholesale. It's not really designed to be used that way, because you can't define everything as a macro -- most of the constituent characters are associated with built-in behaviors that can't be defined as reader macros.

Resources