PEGKit Keep trying rules - pegkit

Suppose I have a rule:
myCoolRule:
Word
| 'myCoolToken' Word otherRule
I supply as input myCoolToken something else now it attempts to parse it greedily matches myCoolToken as a word and then hits the something and says uhhh I expected EOF, if I arrange the rules so it attempts to match myCoolToken first all is good and parses perfectly, for that input.
I am wondering if it is possible for it to keep trying all the rules in that statement to see if any works. So it matches Word fails, comes back and then tries the next rule.
Here is the actual grammar rules causing problems:
columnName = Word;
typeName = Word;
//accepts CAST and cast
cast = { MATCHES_IGNORE_CASE(LS(1), #"CAST") }? Word ;
checkConstraint = 'CHECK' '('! expr ')'!;
expr = requiredExp optionalExp*;
requiredExp = (columnName
| cast '(' expr as typeName ')'
... more but not important
optionalExp ...not important
The input CHECK( CAST( abcd as defy) ) causes it to fail, even though it is valid
Is there a construct or otherwise to make it verify all rules before giving up.

Creator of PEGKit here.
If I understand your question, No, this is not possible. But this is a feature of PEGKit, not a bug.
Your question is related to "determinacy" vs "nondeterminacy". PEGKit is a "deterministic" toolkit (which is widely considered a desirable feature for parsing programming languages).
It seems you are looking for a more "nondeterministic" behavior in this case, but I don't think you should be :).
PEGKit allows you to specify the priority of alternate options via the order in which the alternate options are listed. So:
foo = highPriority
| lowerPriority
| lowestPriority
;
If the highPriority option matches the current input, the lowerPriority and lowestPriority options will not get a chance to try to match, even if they are somehow a "better" match (i.e. they match more tokens than highPriority).
Again, this is related to "determinacy" (highPriority is guaranteed to be given primacy) and is widely considered a desirable feature when parsing programming languages.
So if you want your cast() expressions to have a higher priority than columnName, simply list the cast() expression as an option before the columnName option.
requiredExp = (cast '(' expr as typeName ')'
| columnName
... more but not important
OK, so that takes care of the syntactic details. However, if you have higher-level semantic constraints which can affect parsetime decisions about which alternative should have the highest priority, you should use a Semantic Predicate, like:
foo = { shouldChooseOpt1() }? opt1
| { shouldChooseOpt2() }? opt2
| defaultOpt
;
More details on Semantic Predicates here.

Related

What does => mean in Ada?

I understand when and how to use => in Ada, specifically when using the keyword 'others', but I am not sure of its proper name nor how and why it was created. The history and development of Ada is very interesting to me and I would appreciate anyone's insight on this.
=> is called arrow. It is used with any form of parameter, not only with the parameter 'others'.
Section 6.4 of the Ada Reference Manual states:
parameter_association ::= [formal_parameter_selector_name =>]
explicit_actual_parameter
explicit_actual_parameter ::= expression | variable_name
A parameter_association is named or positional according to whether or
not the formal_parameter_selector_name is specified. Any positional
associations shall precede any named associations. Named associations
are not allowed if the prefix in a subprogram call is an
attribute_reference.
Similarly, array aggregates are described in section 4.3.3
array_aggregate ::= positional_array_aggregate |
named_array_aggregate
positional_array_aggregate ::=
(expression, expression {, expression}) | (expression {, expression}, others => expression) | (expression {, expression},
others => <>)
named_array_aggregate ::=
(array_component_association {, array_component_association})
array_component_association ::=
discrete_choice_list => expression | discrete_choice_list => <>
The arrow is used to associate an array index with a specific value or to associate a formal parameter name of a subprogram with the actual parameter.
Stack Overflow isn’t really the place for this kind of question, which is why it's received at least one close vote.
That said, "arrow" has been present in the language since its first version; see ARM83 2.2. See also the Ada 83 Rationale; section 3.5 seems to be the first place where it’s actually used, though not by name.
As a complement to Jim's answer, on the usage/intuitiveness side: the arrow X => A means in various places of the Ada syntax: value A goes to place X. It is very practical, for instance, to fill an array with an arbitrary cell order. See slide 8 of this presentation for an application with large arrays. Needless to say that the absence of the arrow notation would lead to a heap of bugs in such a case. Sometimes it is just useful for making the associations more readable. You can see it here in action for designing a game level.

Why can't a range of char be collected?

I'm trying to generate a vector containing lowercase ASCII characters. This more convoluted approach works:
let ascii_lowercase = (b'a'..=b'z').map(|b| b as char).collect::<Vec<char>>();
But this more straightforward one, which I came up with in the first place, does not:
let ascii_lowercase = ('a'..='z').collect::<Vec<char>>();
The error is:
error[E0599]: no method named `collect` found for type `std::ops::RangeInclusive<char>` in the current scope
--> src/main.rs:2:39
|
2 | let ascii_lowercase = ('a'..='z').collect::<Vec<char>>();
| ^^^^^^^
|
= note: the method `collect` exists but the following trait bounds were not satisfied:
`std::ops::RangeInclusive<char> : std::iter::Iterator`
`&mut std::ops::RangeInclusive<char> : std::iter::Iterator`
Which is weird, because as far as I understand, there is a blanket implementation of Iterator for RangeInclusive.
Is it impossible to use a range of chars as an iterator? If so, why? If not, what am I doing wrong? I'm using stable Rust 2018 1.31.1.
EDIT 2020-07-17: since Rust 1.45.0, the trait Step is implemented for char, making Range<char> (and some other char ranges) work as an iterator. The code in the question now compiles without problem!
Old answer below.
The expression b'a'..=b'z' has the type RangeInclusive<u8> (see on Playground) because the expression b'a' has the type u8: that's what the b in front of the character literal is for. On the other hand, the expression 'a'..='z' (without the bs) has the type RangeInclusive<char>.
[...] there is a blanket implementation of Iterator for RangeInclusive.
For one, this is not what we call "blanket implementation" (this is when the impl block is for T or for &T (or similar) with T being a generic type). But yes, there is an impl. But let's take a closer look:
impl<A> Iterator for RangeInclusive<A>
where
A: Step, // <--- important
The A: Step bound is important. As you can see in the documentation for Step, this trait is implemented for all primitive integer types, but not for char. This means that there is no clear "add one" operation on characters. Yes, you could define it to be the next valid Unicode codepoint, but the Rust developers probably decided against that for a good reason.
As a consequence, RangeInclusive<char> does not implement Iterator.
So your solution is already a good one. I would probably write this:
(b'a'..=b'z').map(char::from).collect::<Vec<_>>()
The only real advantage is that in this version, char doesn't appear twice.
The problem was that the iteration capabilities of range types depend on the Step trait (see extended answer).
However, starting from Rust 1.45, char also implements Step (PR 72413), which means that the code in the question now works!
let ascii_lowercase: Vec<char> = ('a'..='h').collect();
assert_eq!(
ascii_lowercase,
vec!['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'],
);

Replacing types in AST rascal

I am trying to replace all types in an AST.
Analyzing Java language using m3 model; definitions from here
If we take this code:
Int a = 1;
I am able to update the type of 1 to void for example.
But I am not able to change the type of the variable itself.
I've included some example lines.
Is someone able to point out the errors in the lines?
case \method(Type \return, str name, list[Declaration] parameters, list[Expression] exceptions)
=> \method(\int(), "funcHolder", parameters, exceptions)
case \type(Type \type) => \void()
case \type => \void
Ok, excellent question. First your code and the errors it might have:
This looks good:
case \method(Type \return, str name, list[Declaration] parameters, list[Expression] exceptions)
=> \method(\int(), "funcHolder", parameters, exceptions)
The definition is: data Declaration = \method(Type \return, str name, list[Declaration] parameters, list[Expression] exceptions, Statement impl); (see here), and your code follows exactly the definition. Every abstract method declaration in the ASTs you've parsed will match with this, since there is another \method declaration for methods with bodies with an additional argument.
It may be that you do not have abstract method bodies in your example code, in that case this does nothing.
A simpler version would also work fine:
case \method(_, _, parameters, exceptions) => \method(\int(), "funcHolder", parameters, exceptions)
The next one has issues:
case \type(Type \type) => \void()
Because data Expression = \type(Type \type), that is an Expression and data Type = \void() that is a Type or data TypeSymbol = \void() it is a TypeSymbol the rewrite is not type-preserving and would do wrong things if the Rascal compiler would not detect this. Mostly it will probably not work for you because your example code does not contain this particular kind of expression. I suspect it might be the abstract notation for things such as int.class and Integer.class.
Then this one is "interesting":
case \type => \void()
In principle, if \type is not bound in the current scope, then this matches literally anything. But probably there is a function called \type or a variable or something somewhere, and thus this pattern tests for equality with that other thing in scope. Very nasty! It will not match with anything I would guess. BTW, we are planning a "Rascal Amendement Proposal" for a language change to avoid such accidental bindings of things in the scope of a pattern.
Later from the commments I learned that the goal was to replace all instances of a Type in the AST by void(), to help in clone detection modulo type names. This is done as follows:
case Type _ => \void()
We use a [TypedVariable] pattern, with the variable name _ to match any node of algebraic type Type and forget about the binding. That node will then be replaced by void().
My way of working in the absence of a content-assist tool for pattern matching is as follows:
find full one AST example of something you want to match, say Int a = 1;
copy it into the Rascal source file
remove the parts you want to abstract from by introducing variables or _
test on the original example
test on the rest of the system by printing out the loc that have matched and clicking on the locs to bring you to the code and see if it wasn't a false positive.
for example I want to rewrite Int to void, I find an example of Int in an AST and paste it:
visit (ast) {
case simpleType(simpleName("Int")) => Type::\void() // I added Type:: to be sure to disambiguate with TypeSymbol::void()
}
With some debugging code attached to print out all matches:
visit (ast) {
case e:simpleType(simpleName("Int")) => Type::\void()
when bprintln("found a type at <e.src?|unknown:///|>");
}
Maybe you find out that that has way too many matches, and you have to become more specific, so let's only change declarations Int _ = ....;, we first take an example:
\variables(simpleType(simpleName("Int")), [...lots of stuff here...])
and then we simplify it:
\variables(simpleType(simpleName("Int")), names)
and then include it in the visit:
visit (ast) {
case \variables(simpleType(simpleName("Int")), names) => \variables(Type::\void(), names)
}
Final remark, as you can see you can nest patterns as deeply as you want, so any combination is fine. A "complex" example:
\variables(/"Int", names)
This pattern finds any variable declaration where the name "Int" is used somewhere as part of the type declaration. That's more loose than our original and might catch more than you bargained for. It's just to show what combinations you might want to do.
A final example of this: find all variable declarations with a type name which starts with "Int" but could end with anything else, like "Integer" or "IntFractal", etc:
\variables(simpleType(simpleName(/^Int/)), names)

Trying to understand the SML option structure

Okay so I started learning SML for a class, and I'm stuck with the option structure.
What I have so far for this example:
datatype suit = spades|hearts|clubs|diamonds;
datatype rank = ace|two|three|...|j|q|k|joker;
type card = suit*rank;
My lecturer was trying to explain the use of the option structure by saying that not all cards necessarily have a suit; jokers don't have a suit associated with them.
So when designing a function getsuit to get the suit of a card, we have the following:
datatype 'a option = NONE | SOME of 'a;
fun getsuit ((joker,_):card):suit option = NONE
| getsuit ((_,s):card):suit option = SOME s;
But using emacs, I get two errors, one saying how the pattern and constraint don't agree,
pattern: rank * ?.suit
constraint: rank * suit
and the other saying how the expression type and the resulting types don't agree.
expression: ?.suit option
result type: suit option
This was the code provided by the lecturer so clearly they're not of much help if it results in errors.
What's the meaning of "?." and why does it show up? How would I correctly define this function?
Not really a problem with option as you've defined it.
You've got the order of suit and rank in your card patterns the wrong way:
Try:
datatype 'a option = NONE | SOME of 'a;
fun getsuit ((_, joker):card):suit option = NONE
| getsuit ((s, _):card):suit option = SOME s;
My version of ML probably prints errors differently so I'm not sure how to explain the the meaning of ?. etc. But it's simple enough if you take it bit by bit:
Try
(clubs, ace);
The interpreter (or emacs if that's what you're using) tells you the type is a product of a suit * rank. That's ML's type inference at work, but you can specify the type (you expect) like this:
(clubs, ace): suit*rank;
Or
(clubs, ace): card; (* Works, since card is defined as (suit*rank) *)
And you won't have any complaints. But obviously you would if you did
(clubs, ace): rank*suit;
Or
(clubs, ace): card; (* card is defined as (rank*) *)
You placed a constraint on the type of getsuit's argument (that it must be a card, or the compatible (suit*rank) product), but the pattern is of type (rank*?) or (?*rank), neither of which are compatible with (suit*rank).

Using ANTLR with Left-Recursive Rules

Basically, I've written a parser for a language with just basic arithmetic operators ( +, -, * / ) etc, but for the minus and plus cases, the Abstract Syntax Tree which is generated has parsed them as right associative when they need to be left associative. Having googled for a solution, I found a tutorial that suggests rewriting the rule from:
Expression ::= Expression <operator> Term | Term`
to
Expression ::= Term <operator> Expression*
However, in my head this seems to generate the tree the wrong way round. Any pointers on a way to resolve this issue?
First, I think you meant
Expression ::= Term (<operator> Expression)*
Back to your question: You do not need to "resolve the issue", because ANTLR has no problem dealing with tail recursion. I'm nearly certain that it replaces tail recursion with a loop in the code that it generates. This tutorial (search for the chapter called "Expressions" on the page) explains how to arrive at the e1 = e2 (op e2)* structure. In general, though, you define expressions in terms of higher-priority expressions, so the actual recursive call happens only when you process parentheses and function parameters:
expression : relationalExpression (('and'|'or') relationalExpression)*;
relationalExpression : addingExpression ((EQUALS|NOT_EQUALS|GT|GTE|LT|LTE) addingExpression)*;
addingExpression : multiplyingExpression ((PLUS|MINUS) multiplyingExpression)*;
multiplyingExpression : signExpression ((TIMES|DIV|'mod') signExpression)*;
signExpression : (PLUS|MINUS)* primeExpression;
primeExpression : literal | variable | LPAREN expression /* recursion!!! */ RPAREN;

Resources