Using ANTLR with Left-Recursive Rules - recursion

Basically, I've written a parser for a language with just basic arithmetic operators ( +, -, * / ) etc, but for the minus and plus cases, the Abstract Syntax Tree which is generated has parsed them as right associative when they need to be left associative. Having googled for a solution, I found a tutorial that suggests rewriting the rule from:
Expression ::= Expression <operator> Term | Term`
to
Expression ::= Term <operator> Expression*
However, in my head this seems to generate the tree the wrong way round. Any pointers on a way to resolve this issue?

First, I think you meant
Expression ::= Term (<operator> Expression)*
Back to your question: You do not need to "resolve the issue", because ANTLR has no problem dealing with tail recursion. I'm nearly certain that it replaces tail recursion with a loop in the code that it generates. This tutorial (search for the chapter called "Expressions" on the page) explains how to arrive at the e1 = e2 (op e2)* structure. In general, though, you define expressions in terms of higher-priority expressions, so the actual recursive call happens only when you process parentheses and function parameters:
expression : relationalExpression (('and'|'or') relationalExpression)*;
relationalExpression : addingExpression ((EQUALS|NOT_EQUALS|GT|GTE|LT|LTE) addingExpression)*;
addingExpression : multiplyingExpression ((PLUS|MINUS) multiplyingExpression)*;
multiplyingExpression : signExpression ((TIMES|DIV|'mod') signExpression)*;
signExpression : (PLUS|MINUS)* primeExpression;
primeExpression : literal | variable | LPAREN expression /* recursion!!! */ RPAREN;

Related

What does => mean in Ada?

I understand when and how to use => in Ada, specifically when using the keyword 'others', but I am not sure of its proper name nor how and why it was created. The history and development of Ada is very interesting to me and I would appreciate anyone's insight on this.
=> is called arrow. It is used with any form of parameter, not only with the parameter 'others'.
Section 6.4 of the Ada Reference Manual states:
parameter_association ::= [formal_parameter_selector_name =>]
explicit_actual_parameter
explicit_actual_parameter ::= expression | variable_name
A parameter_association is named or positional according to whether or
not the formal_parameter_selector_name is specified. Any positional
associations shall precede any named associations. Named associations
are not allowed if the prefix in a subprogram call is an
attribute_reference.
Similarly, array aggregates are described in section 4.3.3
array_aggregate ::= positional_array_aggregate |
named_array_aggregate
positional_array_aggregate ::=
(expression, expression {, expression}) | (expression {, expression}, others => expression) | (expression {, expression},
others => <>)
named_array_aggregate ::=
(array_component_association {, array_component_association})
array_component_association ::=
discrete_choice_list => expression | discrete_choice_list => <>
The arrow is used to associate an array index with a specific value or to associate a formal parameter name of a subprogram with the actual parameter.
Stack Overflow isn’t really the place for this kind of question, which is why it's received at least one close vote.
That said, "arrow" has been present in the language since its first version; see ARM83 2.2. See also the Ada 83 Rationale; section 3.5 seems to be the first place where it’s actually used, though not by name.
As a complement to Jim's answer, on the usage/intuitiveness side: the arrow X => A means in various places of the Ada syntax: value A goes to place X. It is very practical, for instance, to fill an array with an arbitrary cell order. See slide 8 of this presentation for an application with large arrays. Needless to say that the absence of the arrow notation would lead to a heap of bugs in such a case. Sometimes it is just useful for making the associations more readable. You can see it here in action for designing a game level.

Writing a syntax analyser (parser) from BNF without using a generator

I'm interested in writing a parser (syntax analyser) from a BNF grammar without using a generator tool like yacc or bison.
I take as example the BNF for simple arithmetic expression : (extracted from the Dragon Book 2.2.6)
expr -> expr + term | expire - term | term
term -> term * factor | term / factor | factor
factor -> number | (expr)
Suppose I would like to create the equivalent code.
I guess I should create three functions called parseExpr, parseTerm and parseFactor.
How do I construct these functions regarding to the BNF defined upwards ?
For the parseFactor it seems to be obvious :
Get the token type
If type number return a node for number type
If the token represents an opening parenthesis, get the node returned by parseExpr
Check that next token is a closing parenthesis. If yes, return the node obtained at 3. If no, throw an error
For the parseExpr, I'm a little bit confused about how to start and interpret the production : expr -> expr + term | expire - term | term
How to translate this production ? How to detect which case applies ? Same question for the last production ?

PEGKit Keep trying rules

Suppose I have a rule:
myCoolRule:
Word
| 'myCoolToken' Word otherRule
I supply as input myCoolToken something else now it attempts to parse it greedily matches myCoolToken as a word and then hits the something and says uhhh I expected EOF, if I arrange the rules so it attempts to match myCoolToken first all is good and parses perfectly, for that input.
I am wondering if it is possible for it to keep trying all the rules in that statement to see if any works. So it matches Word fails, comes back and then tries the next rule.
Here is the actual grammar rules causing problems:
columnName = Word;
typeName = Word;
//accepts CAST and cast
cast = { MATCHES_IGNORE_CASE(LS(1), #"CAST") }? Word ;
checkConstraint = 'CHECK' '('! expr ')'!;
expr = requiredExp optionalExp*;
requiredExp = (columnName
| cast '(' expr as typeName ')'
... more but not important
optionalExp ...not important
The input CHECK( CAST( abcd as defy) ) causes it to fail, even though it is valid
Is there a construct or otherwise to make it verify all rules before giving up.
Creator of PEGKit here.
If I understand your question, No, this is not possible. But this is a feature of PEGKit, not a bug.
Your question is related to "determinacy" vs "nondeterminacy". PEGKit is a "deterministic" toolkit (which is widely considered a desirable feature for parsing programming languages).
It seems you are looking for a more "nondeterministic" behavior in this case, but I don't think you should be :).
PEGKit allows you to specify the priority of alternate options via the order in which the alternate options are listed. So:
foo = highPriority
| lowerPriority
| lowestPriority
;
If the highPriority option matches the current input, the lowerPriority and lowestPriority options will not get a chance to try to match, even if they are somehow a "better" match (i.e. they match more tokens than highPriority).
Again, this is related to "determinacy" (highPriority is guaranteed to be given primacy) and is widely considered a desirable feature when parsing programming languages.
So if you want your cast() expressions to have a higher priority than columnName, simply list the cast() expression as an option before the columnName option.
requiredExp = (cast '(' expr as typeName ')'
| columnName
... more but not important
OK, so that takes care of the syntactic details. However, if you have higher-level semantic constraints which can affect parsetime decisions about which alternative should have the highest priority, you should use a Semantic Predicate, like:
foo = { shouldChooseOpt1() }? opt1
| { shouldChooseOpt2() }? opt2
| defaultOpt
;
More details on Semantic Predicates here.

SML '97: what is exactly the standard syntax?

I came to this question, when I wanted to check something about the syntax of functor declarations. I came to two contradictory syntax definitions, while the syntax of Standard ML '97, as its name suggest, is supposed to be part of a standard, defined in “The definition of Standard ML — Revised”.
From the book
“The definition of Standard ML — Revised”, by R. Milner, page 14, on Google Books says:
fundec ::= functor funbinf
funbind ::= funid (strid : sigexp) = strexp <and funbind>
I read it as “A functor gets exactly one argument and cannot be said to match a signature”.
From another reliable source
“Standard ML syntax summary”, by L. Paulson, page 2, on PDF says (schema approximately re‑expressed using the same notation as in the definition of SML '97):
FunctorDeclaration ::= functor FunctorBinding <and FunctorBinding>
FunctorBinding ::= Ident ( FunctorArguments ) : Signature = Structure
FunctorArguments ::= Ident : Signature | Specification
I read it as “A functor may get multiple arguments and may be said to match a signature”.
Question
The two documents says different things, so I'm confused. What is the real definition of Standard ML '97? Or am I just miss‑reading the standard definition?
Chapters 2 and 3 of the Definition only give the bare syntax of the language. That's extended by the "derived forms" (i.e., syntactic sugar) defined in Appendix A, which include the funid (spec) form (which is short for funid (X : sig spec end) with X being opened on the RHS).
See here for a complete SML grammar including all derived forms.

How can I express a type in F# that optionally recurses on itself (infinitely)

As a learning exercise I am trying to implment a parser for the graphviz dot language (The DOT language) using the functional parser library fparsec (FParsec). The language describes graphs.
Looking at the language definition I was compelled to write down the following definition:
let rec pstmt_list = opt(pstmt .>> opt(pchar ';') >>. opt pstmt_list)
Where pstmt and pchar ';' are parsers, .>> and >>. combine an occurence of the left parser with an occurence of the right parser, and opt parsers an optional occurrence of its argument parser as an option value. However this definition does not work complaining "... the resulting type would be infinite ...".
This example is probably most easily understood by taking a look at the DOT language linked above.
I am aware of the following seemingly linked questions:
Are Infinite Types (aka Recursive Types) not possible in F#?
Haskell to F# - declare a recursive types in f#
But my F# knowledge may not be sufficient to translate them yet, if they apply here at all.
FParsec provides special combinators for parsing sequences. Normally you should prefer these combinators to reimplementing them with a recursive function. You can find an overview of the available combinators for parsing sequences here: http://www.quanttec.com/fparsec/reference/parser-overview.html#parsing-sequences
In this example pstmt_list is a sequence of statements separated and optionally ended by semicolons, so you could easily define the parser as
let pstmt_list = sepEndBy pstmt (pstring ";")
The problem is that your pstmt_list parser produces some values of some type, but when you use it in the definition, you wrap the values of this type with additional option type (using the opt combinator).
The F# compiler thinks that the type of the values returned by the parser e.g. 'a should be the same as the wrapped type option 'a (which is, of course, not possible).
Anyway, I don't think that this is quite what you need to do - the .>> combinator creates a parser that returns the result of the second argument, which means that you'll be ignoring all the results of pstmt parsed so far.
I think you probably need something like this:
let rec pstmt_list : Parser<int list, unit> =
parse.Delay(fun () ->
opt(pstmt .>> pchar ';') .>>. opt pstmt_list
|>> (function Some(prev), Some(rest) -> prev::rest
| Some(prev), _ -> [prev]
| _, Some(rest) -> rest
| _ -> [] ))
The additional use of Delay is to avoid declaring a value that refers directly to itself.

Resources