ANTLR: how to make a rewrite rule to set a lexema as a AST node text

ANTLR: how to make a rewrite rule to set a lexema as a AST node text - abstract-syntax-tree

The following grammar rule aims at recognizing expressions such "a-b" in a grammar that generates a AST to evaluate a linear equation:
tokens {
PLUS = '+' ;
MINUS = '-' ;
DIV = '/' ;
EQUAL = '=' ;
MULT = '*' ;
}
minusExpr: (a=multExpr -> $a) (MINUS b=multExpr -> ^(PLUS $a ^(MINUS $b)))*;
The grammar is working correctly. The only problem that I have is that in the output AST, the text of the token is set to "PLUS" instead of "+".
For example, for the equation: x-1=11
it generates the following tree (the grammar has other rules that I haven't copy here):
(= (PLUS x (- 1)) 11)
Instead of the tree:
(= (+ x (- 1)) 11)
I would like to know how to rewrite the rule so that the AST node label is set to "+" instead of "PLUS". Thanks!

The text '+' is just the input that is converted by the lexer to tokens (in that case with a token type of PLUS). You cannot rewrite that as the lexer will always convert your input to tokens (because the parser only works with tokens).
However, each token has the text it was created from internally stored. So when you walk your tree you can get the original text of each token at any time by calling getText() on the CommonToken or the BaseTree class.

Related

Suppress empty result from 'many' in 'seq(p, many(p))' construct with parser combinators

I'm trying to build parser combinators following Hutton and Meijer, "Monadic Parser Combinators". My implementation is in PostScript, but I think my issue is general to combinator parsers and not my specific implementation.
As a small exercise, I'm using the parsers to recognize regular expressions.
(pc9.ps)run
/Dot (.) char def
/Meta (*+?) anyof def
/Character (*+?.|()) noneof def
/Atom //Dot
//Character plus def
/Factor //Atom //Meta maybe seq def
/Term //Factor //Factor many seq def
/Expression //Term (|) char //Term xthen many seq def
/regex { string-input //Expression exec ps } def
(abc|def|ghi) regex
quit
It's working, but the output has lots of [] empty arrays that really get in the way when I try to bind handlers to process the values.
$ gsnd -q -dNOSAFER pc9re2.ps
stack:
[[[[[97 []] [[98 []] [[99 []] []]]] [[[100 []] [[101 []] [[102 []]
[]]]] [[[103 []] [[104 []] [[105 []] []]]] []]]] null]]
These are happening whenever a seq sequencing combinator accepts the result from maybe or many (which uses maybe) that had zero occurrences.
What is the normal way of excluding this extra noise in the output with Parser Combinators?
github repo

Sigh. It's seems I can just implement around it. I added special code in seq to detect an empty right-hand-side and just discard it. On to other problems...
Edit: I encountered the same problem again in version 11 (and a half). Now I've got a better solution IMO:
https://groups.google.com/g/comp.lang.functional/c/MbJxrJSk8Mw/m/MoT3Dr0IAwAJ
Ugh. I think it wasn't even an X/Y problem. It was a "doctor it hurts
when I move my arm like this; ... so don't move your arm like that"
problem.
I want the "result" part of the "reply" structure (using new terms
following usage from the Parsec document) to be any of the /usual/
PostScript types: integer, real, string, boolean, array, dictionary.
But I also need some way to arbitrarily combine or concatenate two
objects regardless of type. My then (aka seq) combinator needs to
do this. So I made a hack-y function that does the combining. If it
has two arrays, it composes the contents into a longer array. If it
has one array and some other object it extends the array by one and
stuffs the object in the front or back as appropriate. If it has two
non-array objects it makes a new 2-element array to contain them.
So, instead of building xthen and thenx off of then and needing
to cons, car, and cdr the stuff, I can write all 3 of these as a more
general parameterized function.
sequence{ p q u }{
{ /p exec +is-ok {
next x-xs force /q exec +is-ok {
next x-xs 3 1 roll /u exec exch consok
}{
x-xs 3 2 roll ( after ) exch cons exch cons cons
} ifelse
} if } ll } #func
then { {append} sequence }
xthen { {exch pop} sequence }
thenx { {pop} sequence }
append { 1 index zero eq { exch pop }{
dup zero eq { pop }{
1 index type /arraytype eq {
dup type /arraytype eq { compose }{ one compose } ifelse
}{ dup type /arraytype eq { curry }{ cons } ifelse } ifelse } ifelse } ifelse }
(#func is my own non-standard extension to PostScript that takes a
procedure body and list of parameters and wraps the procedure with
code that defines the arguments in a local dictionary. ll is my
hack-y PostScript way of making lambdas with hard-patched parameters,
it's short for load all literals.)
The code also treats executable arrays (ie. PostScript procedures) as
a non-array for the purpose of combining sequences of results. This allows
the parser to be used as a syntax-directed compiler producing procedures
as output.

How to replace a path in AST with just parsed javascript(string)?

https://astexplorer.net/#/gist/70df1bc56b9ee73d19fc949d2ef829ed/7e14217fd8510f0bf83f3372bf08454b7617bce1
I've found now I'm trying to replace an expression and I don't care whats in it.
in this example I've found the this.state.showMenu && this.handleMouseDown portion in
<a
onMouseDown={this.state.showMenu && this.handleMouseDown}
>
I need to convert to:
<a
onMouseDown={this.state.showMenu ? this.handleMouseDown : undefined}
>
how can I do so without explicitly reconstructing the tree? I just want to do something like
path.replaceText("this.state.showMenu ? this.handleMouseDown : undefined")

Here's a transformer that does what you describe:
export default function transformer(file, api) {
const j = api.jscodeshift;
const root = j(file.source)
root
.find(j.JSXExpressionContainer)
.replaceWith(path => {
return j.jsxExpressionContainer(
j.conditionalExpression(
j.identifier(j(path.value.expression.left).toSource()),
j.identifier(j(path.value.expression.right).toSource()),
j.identifier('undefined')
)
)
})
return root.toSource()
}
See it in action here.
You can also just put arbitrary text in the JSXExpressionContainer node:
export default function transformer(file, api) {
const j = api.jscodeshift;
const root = j(file.source)
root
.find(j.JSXExpressionContainer)
.replaceWith(path => {
return j.jsxExpressionContainer(
j.identifier('whatever you want')
)
})
return root.toSource()
}
See this example.
Finally, you don't even need to return a JSXExpressionContainer.
export default function transformer(file, api) {
const j = api.jscodeshift;
const root = j(file.source)
root
.find(j.JSXExpressionContainer)
.replaceWith(path => {
return j.identifier("this isn't valid JS, but works just fine")
})
return root.toSource()
}
See the result here.

You can do this with our DMS Software Reengineering Toolkit.
DMS treats HTML pages as native HTML text with embedded scripting sublanguage, which might be ECMAScript, or VBScript, or something else.
So the process of building a complete HTML "AST" requires that one first
build the pure HTML part, then find all the "onXXXXX" tags and convert them to ASTs in the chosen scripting language. DMS can distinguish AST nodes from different langauges so there's no chance of confusion in understanding the compound AST.
So, first we need to parse the HTML document of interest (code edited for pedagogical reasons):
(local (;; [my_HTML_AST AST:Node]
(includeunique `DMS/Domains/HTML/Component/ParserComponent.par')
);;
(= working_graph (AST:CreateForest))
(= my_HTML_AST (Parser:ParseFile parser working_graph input_file_full_path))
Then we need to walk over the HTML tree, find the JavaScript text fragments, parse them and splice the parsed ECMASCript tree in to replace the text fragment:
(local (;; (includeunique `DMS/Domains/ECMAScript/Components/ParserComponent.par') );;
(ScanNodes my_HTML_AST
(lambda (function boolean AST:Node)
(ifthenelse (!! (~= (AST:GetNodeType ?) GrammarConstants:Rule:Attribute) ; not an HTML attribute
(~= (Strings:Prefix (AST:GetLiteralString (AST:GetFirstChild ?)) `on')) ; not at action attribute
)&&
~t ; scan deeper into tree
(value (local (;; [my_ECMAScript_AST AST:Node]
[ECMASCript_text_stream streams:buffer]
);;
(= ECMAScript_text_stream (InputStream:MakeBufferStream (AST:StringLiteral (AST:GetSecondChild ?))=
(= my_ECMAScript_AST (Parser:ParseStream parser working_graph ECMAScript_text_stream))
(= AST:ReplaceNode ? my_ECMAScript_AST)
(= InputStream:Close my_ECMAScript_text_stream)
~f) ; no need to scan deeper here
)ifthenelse
)lambda
) ; at this point, we have a mixed HTML/ECMAScript tree
)local
If the scripting language can be something else, then this code has to change. If your pages are all HTML + ECMAScript, you can wrap the above stuff into a black box and call it "(ParseHTML)" which is what the other answer assumed happened.
Now for the actual work. OP want to replace a pattern found in his HTML with another. Here DMS shines because you can write those patterns, using the syntax of the targeted language, directly as a DMS Rewrite Rule (see this link for details).
source domain ECMAScript;
target domain ECMAScript;
rule OP_special_rewrite()=expression -> expression
"this.state.showMenu && this.handleMouseDown"
-> "this.state.showMenu ? this.handleMouseDown : undefined "
Now you need to apply this rewrite:
(RSL:Apply my_HTML_AST `OP_special_rewrite') ; applies this rule to every node in AST
; only those that match get modified
And finally regenerate text from the AST:
(PrettyPrinter:PrintStream my_ECMAScript_AST input_file_full_path)
OP's example is pretty simply because he is matching against what amounts to a constant pattern. DMS's rules can be written using all kinds of pattern variables; see above link, and can have arbitrary conditions over the matched pattern and other state information to control whether the rule applies.

Erlang: How to create a function that returns a string containing the date in YYMMDD format?

I am trying to learn Erlang and I am working on the practice problems Erlang has on the site. One of them is:
Write the function time:swedish_date() which returns a string containing the date in swedish YYMMDD format:
time:swedish_date()
"080901"
My function:
-module(demo).
-export([swedish_date/0]).
swedish_date() ->
[YYYY,MM,DD] = tuple_to_list(date()),
string:substr((integer_to_list(YYYY, 3,4)++pad_string(integer_to_list(MM))++pad_string(integer_to_list(DD)).
pad_string(String) ->
if
length(String) == 1 -> '0' ++ String;
true -> String
end.
I'm getting the following errors when compiled.
demo.erl:6: syntax error before: '.'
demo.erl:2: function swedish_date/0 undefined
demo.erl:9: Warning: function pad_string/1 is unused
error
How do I fix this?

After fixing your compilation errors, you're still facing runtime errors. Since you're trying to learn Erlang, it's instructive to look at your approach and see if it can be improved, and fix those runtime errors along the way.
First let's look at swedish_date/0:
swedish_date() ->
[YYYY,MM,DD] = tuple_to_list(date()),
Why convert the list to a tuple? Since you use the list elements individually and never use the list as a whole, the conversion serves no purpose. You can instead just pattern-match the returned tuple:
{YYYY,MM,DD} = date(),
Next, you're calling string:substr/1, which doesn't exist:
string:substr((integer_to_list(YYYY,3,4) ++
pad_string(integer_to_list(MM)) ++
pad_string(integer_to_list(DD))).
The string:substr/2,3 functions both take a starting position, and the 3-arity version also takes a length. You don't need either, and can avoid string:substr entirely and instead just return the assembled string:
integer_to_list(YYYY,3,4) ++
pad_string(integer_to_list(MM)) ++
pad_string(integer_to_list(DD)).
Whoops, this is still not right: there is no such function integer_to_list/3, so just replace that first call with integer_to_list/1:
integer_to_list(YYYY) ++
pad_string(integer_to_list(MM)) ++
pad_string(integer_to_list(DD)).
Next, let's look at pad_string/1:
pad_string(String) ->
if
length(String) == 1 -> '0' ++ String;
true -> String
end.
There's a runtime error here because '0' is an atom and you're attempting to append String, which is a list, to it. The error looks like this:
** exception error: bad argument
in operator ++/2
called as '0' ++ "8"
Instead of just fixing that directly, let's consider what pad_string/1 does: it adds a leading 0 character if the string is a single digit. Instead of using if to check for this condition — if isn't used that often in Erlang code — use pattern matching:
pad_string([D]) ->
[$0,D];
pad_string(S) ->
S.
The first clause matches a single-element list, and returns a new list with the element D preceded with $0, which is the character constant for the character 0. The second clause matches all other arguments and just returns whatever is passed in.
Here's the full version with all changes:
-module(demo).
-export([swedish_date/0]).
swedish_date() ->
{YYYY,MM,DD} = date(),
integer_to_list(YYYY) ++
pad_string(integer_to_list(MM)) ++
pad_string(integer_to_list(DD)).
pad_string([D]) ->
[$0,D];
pad_string(S) ->
S.
But a simpler approach would be to use the io_lib:format/2 function to just format the desired string directly:
swedish_date() ->
io_lib:format("~w~2..0w~2..0w", tuple_to_list(date())).
First, note that we're back to calling tuple_to_list(date()). This is because the second argument for io_lib:format/2 must be a list. Its first argument is a format string, which in our case says to expect three arguments, formatting each as an Erlang term, and formatting the 2nd and 3rd arguments with a width of 2 and 0-padded.
But there's still one more step to address, because if we run the io_lib:format/2 version we get:
1> demo:swedish_date().
["2015",["0",56],"29"]
Whoa, what's that? It's simply a deep list, where each element of the list is itself a list. To get the format we want, we can flatten that list:
swedish_date() ->
lists:flatten(io_lib:format("~w~2..0w~2..0w", tuple_to_list(date()))).
Executing this version gives us what we want:
2> demo:swedish_date().
"20150829"
Find the final full version of the code below.
-module(demo).
-export([swedish_date/0]).
swedish_date() ->
lists:flatten(io_lib:format("~w~2..0w~2..0w", tuple_to_list(date()))).
UPDATE: #Pascal comments that the year should be printed as 2 digits rather than 4. We can achieve this by passing the date list through a list comprehension:
swedish_date() ->
DateVals = [D rem 100 || D <- tuple_to_list(date())],
lists:flatten(io_lib:format("~w~2..0w~2..0w", DateVals)).
This applies the rem remainder operator to each of the list elements returned by tuple_to_list(date()). The operation is needless for month and day but I think it's cleaner than extracting the year and processing it individually. The result:
3> demo:swedish_date().
"150829"

There are a few issues here:
You are missing a parenthesis at the end of line 6.
You are trying to call integer_to_list/3 when Erlang only defines integer_to_list/1,2.
This will work:
-module(demo).
-export([swedish_date/0]).
swedish_date() ->
[YYYY,MM,DD] = tuple_to_list(date()),
string:substr(
integer_to_list(YYYY) ++
pad_string(integer_to_list(MM)) ++
pad_string(integer_to_list(DD))
).
pad_string(String) ->
if
length(String) == 1 -> '0' ++ String;
true -> String
end.

In addition to the parenthesis error on line 6, you also have an error on line 10 where yo use the form '0' instead of "0", so you define an atom rather than a string.
I understand you are doing this for educational purpose, but I encourage you to dig into erlang libraries, it is something you will have to do. For a common problem like this, it already exists function that help you:
swedish_date() ->
{YYYY,MM,DD} = date(), % not useful to transform into list
lists:flatten(io_lib:format("~2.10.0B~2.10.0B~2.10.0B",[YYYY rem 100,MM,DD])).
% ~X.Y.ZB means: uses format integer in base Y, print X characters, uses Z for padding

Extract nth element of a tuple

For a list, you can do pattern matching and iterate until the nth element, but for a tuple, how would you grab the nth element?

TL;DR; Stop trying to access directly the n-th element of a t-uple and use a record or an array as they allow random access.
You can grab the n-th element by unpacking the t-uple with value deconstruction, either by a let construct, a match construct or a function definition:
let ivuple = (5, 2, 1, 1)
let squared_sum_let =
let (a,b,c,d) = ivuple in
a*a + b*b + c*c + d*d
let squared_sum_match =
match ivuple with (a,b,c,d) -> a*a + b*b + c*c + d*d
let squared_sum_fun (a,b,c,d) =
a*a + b*b + c*c + d*d
The match-construct has here no virtue over the let-construct, it is just included for the sake of completeness.
Do not use t-uples, Don¹
There are only a few cases where using t-uples to represent a type is the right thing to do. Most of the times, we pick a t-uple because we are too lazy to define a type and we should interpret the problem of accessing the n-th field of a t-uple or iterating over the fields of a t-uple as a serious signal that it is time to switch to a proper type.
There are two natural replacements to t-uples: records and arrays.
When to use records
We can see a record as a t-uple whose entries are labelled; as such, they are definitely the most natural replacement to t-uples if we want to access them directly.
type ivuple = {
a: int;
b: int;
c: int;
d: int;
}
We then access directly the field a of a value x of type ivuple by writing x.a. Note that records are easily copied with modifications, as in let y = { x with d = 0 }. There is no natural way to iterate over the fields of a record, mostly because a record do not need to be homogeneous.
When to use arrays
A large² homogeneous collection of values is adequately represented by an array, which allows direct access, iterating and folding. A possible inconvenience is that the size of an array is not part of its type, but for arrays of fixed size, this is easily circumvented by introducing a private type — or even an abstract type. I described an example of this technique in my answer to the question “OCaml compiler check for vector lengths”.
Note on float boxing
When using floats in t-uples, in records containing only floats and in arrays, these are unboxed. We should therefore not notice any performance modification when changing from one type to the other in our numeric computations.
¹ See the TeXbook.
² Large starts near 4.

Since the length of OCaml tuples is part of the type and hence known (and fixed) at compile time, you get the n-th item by straightforward pattern matching on the tuple. For the same reason, the problem of extracting the n-th element of an "arbitrary-length tuple" cannot occur in practice - such a "tuple" cannot be expressed in OCaml's type system.
You might still not want to write out a pattern every time you need to project a tuple, and nothing prevents you from generating the functions get_1_1...get_i_j... that extract the i-th element from a j-tuple for any possible combination of i and j occuring in your code, e.g.
let get_1_1 (a) = a
let get_1_2 (a,_) = a
let get_2_2 (_,a) = a
let get_1_3 (a,_,_) = a
let get_2_3 (_,a,_) = a
...
Not necessarily pretty, but possible.
Note: Previously I had claimed that OCaml tuples can have at most length 255 and you can simply generate all possible tuple projections once and for all. As #Virgile pointed out in the comments, this is incorrect - tuples can be huge. This means that it is impractical to generate all possible tuple projection functions upfront, hence the restriction "occurring in your code" above.

It's not possible to write such a function in full generality in OCaml. One way to see this is to think about what type the function would have. There are two problems. First, each size of tuple is a different type. So you can't write a function that accesses elements of tuples of different sizes. The second problem is that different elements of a tuple can have different types. Lists don't have either of these problems, which is why you can have List.nth.
If you're willing to work with a fixed size tuple whose elements are all the same type, you can write a function as shown by #user2361830.
Update
If you really have collections of values of the same type that you want to access by index, you should probably be using an array.

here is a function wich return you the string of the ocaml function you need to do that ;) very helpful I use it frequently.
let tup len n =
if n>=0 && n<len then
let rec rep str nn = match nn<1 with
|true ->""
|_->str ^ (rep str (nn-1))in
let txt1 ="let t"^(string_of_int len)^"_"^(string_of_int n)^" tup = match tup with |" ^ (rep "_," n) ^ "a" and
txt2 =","^(rep "_," (len-n-2)) and
txt3 ="->a" in
if n = len-1 then
print_string (txt1^txt3)
else
print_string (txt1^txt2^"_"^txt3)
else raise (Failure "Error") ;;
For example:
tup 8 6;;
return:
let t8_6 tup = match tup with |_,_,_,_,_,_,a,_->a
and of course:
val t8_6 : 'a * 'b * 'c * 'd * 'e * 'f * 'g * 'h -> 'g = <fun>

Is this an F# quotations bug?

[<ReflectedDefinition>]
let rec x = (fun() -> x + "abc") ()
The sample code with the recursive value above produces the following F# compiler error:
error FS0432: [<ReflectedDefinition>] terms cannot contain uses of the prefix splice operator '%'
I can't see any slicing operator usage in the code above, looks like a bug... :)
Looks like this is the problem with the quotation via ReflectedDefinitionAttribute only, normal quotation works well:
let quotation =
<# let rec x = (fun() -> x + "abc") () in x #>
produces expected result with the hidden Lazy.create and Lazy.force usages:
val quotation : Quotations.Expr<string> =
LetRecursive
([(x, Lambda (unitVar,
Application
(Lambda (unitVar0,
Call (None,
String op_Addition[String,String,String](String, String),
[Call (None,
String Force[String](Lazy`1[System.String]), // `
[x]), Value ("abc")])),
Value (<null>)))),
(x, Call (None, Lazy`1[String] Create[String](FSharpFunc`2[Unit,String]), [x])),
(x, Call (None, String Force[String](Lazy`1[String]), [x]))], x) // `
So the question is: is this an F# compiler bug or not?

I'd think that this may be caused by the treatment of recursive values in F#. As a workaround, you can turn the recursive reference into a parameter:
[<ReflectedDefinition>]
let foo x = (fun() -> x + "abc") ()
// To construct the recursive value, you'd write:
let rec x = foo x
The last line is of course invalid (just like your original code), because you're creating an immediate recursive reference, but it should give you the idea - in reality, you'd probably enclose x in a lambda function.
EDIT Originally, I thought that the problem may be as below, but I'm not sure now (see comments).
It looks more like a (probably known) limitation to me than an unexpected bug. There is an important difference between the two versions of the code you wrote - in the first case, you're binding a public value (visible to .NET) named x while in the second case, x is just a symbol used only in the quotation.
The quotation that would have to be stored in the meta-data of the assembly would look like this:
let rec x = <# (fun() -> %x + "abc") () #>
The body is quoted, but x is not a quoted symbol, so it needs to be spliced into the quotation (that is, it will be evaluated and the result will be used in its place). Note that this code will fail, because you're declaring a recursive value with immediate reference - x needs to be evaluated as part of its definition, so this won't work.
However, I think that % cannot appear in ReflectedDefinition quotations (that is, you cannot store the above in meta-data), because it involves some runtime aspects - you'd need to evaluate x when loading the meta-data.