Is an abstract-syntax-tree always a binary tree? - abstract-syntax-tree

I have seen many examples of AST doing arithmetic operations which look like this:
+
/ \
1 2
above represent 1 + 2. My question is, are all ASTs binary by definition or a node could possibly have more than 2 children?

Yes, in an AST, nodes often have more than two children. For example, a block of code is typically a node with an array of child nodes, which are statements.

Related

Idiomatic graphs in APL

APL is great for array type problems but I'm curious as to how best work with graphs in APL. I'm playing around with leet questions, for example question 662. Maximum Width of Binary Tree, the exercise works with Node objects with a value/left/right pointer style, however the test-case uses a basic array like [1,3,null,5,3]. The notation is compressed; uncompressed would be [[1], [3,null], [5,3,null,null]]. Reading layer-by-layer give [[1], [3], [5,3]] (so 2 is the widest layer).
Another example,
[5,4,7,3,null,2,null,-1,null,9] gives the answer 2
So I'm not sure the idiomatic way to work with trees. Do I use classes? Or are arrays best? In either case how do I convert the input?
I came up with a couple of solutions, but both feel inelegant. (Apologies for lack of comments)
convert←{
prev ← {(-⌈2÷⍨≢⍵)↑⍵}
nxt←{
⍵≡⍬:⍺
m←2/×prev ⍺
cnt←+/m
(⍺,(m\cnt↑⍵))nxt(cnt↓⍵)
}
(1↑⍵)nxt(1↓⍵)
}
Alternatively,
convert ← {
total←(+/×⍵)
nxt←{
double←×1,2↓2/0,⍵
(((+/double)↑⍺)#⊢)double
}
⍵ nxt⍣{(+/×⍺)=total}1
}
Both solutions are limited in they assume that 0 is null.
Once I've decompressed the input it's simply just a matter of stratifying by it's order
⌈/(1+⌈/-⌊/)∘⍸¨×nodes⊆⍨⍸2*¯1+⍳⌈2⍟≢nodes
In Python though I could use other methods to traverse i.e. keep track of the left/right-most node on a per-depth basis.
NOTE: This may be two questions, one to decompress and the other how to traverse graphs in general, but one depends on the other
Any ideas?
The work of Co-dfns compiler has given lots of insights on working tree/graph like data structures with APL.
Thesis: A Data Parallel Compiler Hosted on the GPU
GitHub repo: github.com/Co-dfns/Co-dfns (Many related goodies in project README file)
However the thesis is quite lengthy so for this particular exercise I would give a brief explanation on how to approach it.
the exercise works with Node objects with a value/left/right pointer style, however the test-case uses a basic array like [1,3,null,5,3].
Do we really actually build the tree with Node type objects to get an answer to the question? You can write the solution in something like Python and translate to APL, but that would be, losing the whole point of writing it in APL...
Notice the input is already an array! It is a bfs traverse of the binary tree. (The co-dfns compiler uses dfs traverse order, though)
so, actually what we need to do is just built a matrix like below for the input like [1,3,2,5,3,null,9] (⍬ is a placeholder value for for null):
1 ⍬ ⍬ ⍬ ⍝ level 0
3 2 ⍬ ⍬ ⍝ level 1
5 3 ⍬ 9 ⍝ level 2
For this problem we don't need to know which node's parent is which.
We can even do something like, by abusing the fact that input has no negative value (even the number could be negative, actually we only care about if it is null), and change ⍬ to ¯1 or 0 and make it easier to compute the answer.
So the problem has became: "compute the matrix representation of the tree as variable tree from the input array, then calculate the width of each level by +/0<tree, then the output is just 2*level (notice the first level is level-0)" This is using wrong definition for the width. I'll show how to correct it below
And it is actually very easy to do the conversion from input to matrix, hint: ↑.
1 (3 2) 5
┌─┬───┬─┐
│1│3 2│5│
└─┴───┴─┘
↑1 (3 2) 5
1 0
3 2
5 0
Thanks for pointing out that my original solution has problem on constructing the tree matrix.
This is the corrected method for constructing the tree. To distinguish from 0 for null and the padding, I add one to the input array so 2 is for non-null and 1 is for null.
buildmatrix←{
⎕IO←0
in←1+(⊂⊂'null')(≢⍤1 0)⎕JSON ⍵
⍝ Build the matrix
loop←{
(n acc)←⍺
0=≢⍵:acc
cur←n↑⍵
(2×+/2=cur)(acc,⊂cur)∇ n↓⍵
}
↑1 ⍬ loop in
}
However since the definition for width here is:
The width of one level is defined as the length between the end-nodes (the leftmost and rightmost non-null nodes), where the null nodes between the end-nodes are also counted into the length calculation.
We can just compute the width while attempting to reconstructing the tree (compute each level's width using \ and / with patterns from previous level):
If last level is 1011 and next level is 100010
1 0 1 1
1 0 0 0 0 0 1 0
(2/1 0 1 1)\1 0 0 0 1 0
1 0 0 0 0 0 1 0
So the it isn't needed to construct the complete matrix, and the answer for the exercise is just:
width←{
⎕IO←0
in←(⊂⊂'null')(≢⍤1 0)⎕JSON ⍵
⍝ strip leading trailing zero
strip←(⌽⍳∘1↓⊢)⍣2
⍝ Build the matrix
loop←{
(prev mw)←⍺
0=≢⍵:mw
cur←⍵↑⍨n←2×+/prev
((⊢,⍥⊂mw⌈≢)strip cur\⍨2/prev)∇ n↓⍵
}
(,1)1 loop 1↓in
}
width '[1,null,2,3,null,4,5,6]'
2
And the interesting fact is, you can probably do the same in other non-array based languages like Haskell. So instead of translating existing algorithms between similar looking languages, by thinking in the APL way you find new algorithms for problems!

Perl 6 calculate the average of an int array at once using reduce

I'm trying to calculate the average of an integer array using the reduce function in one step. I can't do this:
say (reduce {($^a + $^b)}, <1 2 3>) / <1 2 3>.elems;
because it calculates the average in 2 separate pieces.
I need to do it like:
say reduce {($^a + $^b) / .elems}, <1 2 3>;
but it doesn't work of course.
How to do it in one step? (Using map or some other function is welcomed.)
TL;DR This answer starts with an idiomatic way to write equivalent code before discussing P6 flavored "tacit" programming and increasing brevity. I've also added "bonus" footnotes about the hyperoperation Håkon++ used in their first comment on your question.5
Perhaps not what you want, but an initial idiomatic solution
We'll start with a simple solution.1
P6 has built in routines2 that do what you're asking. Here's a way to do it using built in subs:
say { sum($_) / elems($_) }(<1 2 3>); # 2
And here it is using corresponding3 methods:
say { .sum / .elems }(<1 2 3>); # 2
What about "functional programming"?
First, let's replace .sum with an explicit reduction:
.reduce(&[+]) / .elems
When & is used at the start of an expression in P6 you know the expression refers to a Callable as a first class citizen.
A longhand way to refer to the infix + operator as a function value is &infix:<+>. The shorthand way is &[+].
As you clearly know, the reduce routine takes a binary operation as an argument and applies it to a list of values. In method form (invocant.reduce) the "invocant" is the list.
The above code calls two methods -- .reduce and .elems -- that have no explicit invocant. This is a form of "tacit" programming; methods written this way implicitly (or "tacitly") use $_ (aka "the topic" or simply "it") as their invocant.
Topicalizing (explicitly establishing what "it" is)
given binds a single value to $_ (aka "it") for a single statement or block.
(That's all given does. Many other keywords also topicalize but do something else too. For example, for binds a series of values to $_, not just one.)
Thus you could write:
say .reduce(&[+]) / .elems given <1 2 3>; # 2
Or:
$_ = <1 2 3>;
say .reduce(&[+]) / .elems; # 2
But given that your focus is FP, there's another way that you should know.
Blocks of code and "it"
First, wrap the code in a block:
{ .reduce(&[+]) / .elems }
The above is a Block, and thus a lambda. Lambdas without a signature get a default signature that accepts one optional argument.
Now we could again use given, for example:
say do { .reduce(&[+]) / .elems } given <1 2 3>; # 2
But we can also just use ordinary function call syntax:
say { .reduce(&[+]) / .elems }(<1 2 3>)
Because a postfix (...) calls the Callable on its left, and because in the above case one argument is passed in the parens to a block that expects one argument, the net result is the same as the do4 and the given in the prior line of code.
Brevity with built ins
Here's another way to write it:
<1 2 3>.&{.sum/.elems}.say; #2
This calls a block as if it were a method. Imo that's still eminently readable, especially if you know P6 basics.
Or you can start to get silly:
<1 2 3>.&{.sum/$_}.say; #2
This is still readable if you know P6. The / is a numeric (division) operator. Numeric operators coerce their operands to be numeric. In the above $_ is bound to <1 2 3> which is a list. And in Perls, a collection in numeric context is its number of elements.
Changing P6 to suit you
So far I've stuck with standard P6.
You can of course write subs or methods and name them using any Unicode letters. If you want single letter aliases for sum and elems then go ahead:
my (&s, &e) = &sum, &elems;
But you can also extend or change the language as you see fit. For example, you can create user defined operators:
#| LHS ⊛ RHS.
#| LHS is an arbitrary list of input values.
#| RHS is a list of reducer function, then functions to be reduced.
sub infix:<⊛> (#lhs, *#rhs (&reducer, *#fns where *.all ~~ Callable)) {
reduce &reducer, #fns».(#lhs)
}
say <1 2 3> ⊛ (&[/], &sum, &elems); # 2
I won't bother to explain this for now. (Feel free to ask questions in the comments.) My point is simply to highlight that you can introduce arbitrary (prefix, infix, circumfix, etc.) operators.
And if custom operators aren't enough you can change any of the rest of the syntax. cf "braid".
Footnotes
1 This is how I would normally write code to do the computation asked for in the question. #timotimo++'s comment nudged me to alter my presentation to start with that, and only then shift gears to focus on a more FPish solution.
2 In P6 all built in functions are referred to by the generic term "routine" and are instances of a sub-class of Routine -- typically a Sub or Method.
3 Not all built in sub routines have correspondingly named method routines. And vice-versa. Conversely, sometimes there are correspondingly named routines but they don't work exactly the same way (with the most common difference being whether or not the first argument to the sub is the same as the "invocant" in the method form.) In addition, you can call a subroutine as if it were a method using the syntax .&foo for a named Sub or .&{ ... } for an anonymous Block, or call a method foo in a way that looks rather like a subroutine call using the syntax foo invocant: or foo invocant: arg2, arg3 if it has arguments beyond the invocant.
4 If a block is used where it should obviously be invoked then it is. If it's not invoked then you can use an explicit do statement prefix to invoke it.
5 Håkon's first comment on your question used "hyperoperation". With just one easy to recognize and remember "metaop" (for unary operations) or a pair of them (for binary operations), hyperoperations distribute an operation to all the "leaves"6 of a data structure (for an unary) or create a new one based on pairing up the "leaves" of a pair of data structures (for binary operations). NB. Hyperoperations are done in parallel7.
6 What is a "leaf" for a hyperoperation is determined by a combination of the operation being applied (see the is nodal trait) and whether a particular element is Iterable.
7 Hyperoperation is applied in parallel, at least semantically. Hyperoperation assumes8 that the operations on the "leaves" have no mutually interfering side-effects -- that is to say, that any side effect when applying the operation to one "leaf" can safely be ignored in respect to applying the operation to any another "leaf".
8 By using a hyperoperation the developer is declaring that the assumption of no meaningful side-effects is correct. The compiler will act on the basis it is, but will not check that it is true. In the safety sense it's like a loop with a condition. The compiler will follow the dev's instructions, even if the result is an infinite loop.
Here is an example using given and the reduction meta operator:
given <1 2 3> { say ([+] $_)/$_.elems } ;

Questions about Abstract Syntax Tree (AST)

I am finding difficulties to understand
1) AST matching, how two AST's are similar? Are types included in the comparison/matching or only the operations like +, -, ++,...etc inlcuded?
2) Two statements are syntactically similar (This term I read it somewhere in a paper), can we say the below example that the two statement are syntactically similar?
int x = 1 + 2
String y = "1" + "2"
Java - Eclipse is what am using right now and trying to understand the AST for.
Best Regards,
What ASTs are:
An AST is a data structure representing program source text that consists of nodes that contain a node type, and possibly a literal value, and a list of children nodes. The node type corresponds to what OP calls "operations" (+, -, ...) but also includes language commands (do, if, assignment, call, ...) , declarations (int, struct, ...) and literals (number, string, boolean). [It is unclear what OP means by "type"]. (ASTs often have additional information in each node referring back to the point of origin in the source text file).
What ASTs are for:
OP seems puzzled by the presence of ASTs in Eclipse.
ASTs are used to represent program text in a form which is easier to interpret than the raw text. They provide a means to reason about the program structure or content; sometimes they are used to enable the modification of program ("refactoring") by modifying the AST for a program and then regenerating the text from the AST.
Comparing ASTs for similarity is not a really common use in my experience, except in clone detection and/or pattern matching.
Comparing ASTs:
Comparing ASTs for equality is easy: compare the root node type/literal value for equality; if not equal, the comparision is complete, else (recursively) compare the children nodes).
Comparing ASTs of similarity is harder because you must decide how to relax the equality comparision. In particular, you must decide on a precise definition of similarity. There are many ways to define this, some rather shallow syntactically, some more semantically sophisticated.
My paper Clone Detection Using Abstract Syntax Trees describes one way to do this, using similarity defined as the ratio of the number of nodes shared divided by the number of nodes total in both trees. Shared nodes are computed by comparing the trees top down to the point where some child is different. (The actual comparision is to compute an anti-unifier). This similary measure is rather shallow, but it works remarkably well in finding code clones in big software systems.
From that perspective, OPs's examples:
int x = 1 + 2
String y = "1" + "2"
have trees written as S-expressions:
(declaration_with_assignment (int x) (+ 1 2))
(declaration_with_assignment (String y) (+ "1" "2"))
These are not very similar; they only share a root node whose type is "declaration-with-assignment" and the top of the + subtree. (Here the node count is 12 with only 2 matching nodes for a similarity of 2/12).
These would be more similar:
int x = 1 + 2
float x = 1.0 + 2
(S-expressions)
(declaration_with_assignment (int x) (+ 1 2))
(declaration_with_assignment (float x) (+ 1.0 2))
which share the declaration with assignment, the add node, the literal leaf node 2, and arguably the literal nodes for integer 1 and float 1.0, depending on whether you wish to define them as "equal" or not, for a similarity of 4/12.
If you change one of the trees to be a pattern tree, in which some "leaves" are pattern variables, you can then use such pattern trees to find code that has certain structure.
The surface syntax pattern:
?type ?variable = 1 + ?expression
with S-expression
((declaration_with_assignment (?type ?varaible)) (+ 1 ?expression))
matches the first of OP's examples but not the second.
As far as I know, Eclipse doesn't offer any pattern-based matching abilities.
But these are very useful in program analysis and/or program transformation tools. For some specific examples, too long to include here, see DMS Rewrite Rules
(Full disclosure: DMS is a product of my company. I'm the architect).

Find path for N levels with repeating pattern of directional relationships in Neo4J

I'm trying to use Neo4j to analyze relationships in a family tree. I've modeled it like so:
(p1:Person)-[:CHILD]->(f:Family)<-[:FATHER|MOTHER]-(p2)
I know I could have left out the family label and just had children connected to each parent, but that's not practical for my purposes. Here's an example of my graph and the black line is the path I want it to generate:
I can query for it with
MATCH p=(n {personID:3})-[:CHILD]->()<-[:FATHER|MOTHER]-()-[:CHILD]->()<-[:FATHER|MOTHER]-()-[:CHILD]->()<-[:FATHER|MOTHER]-() RETURN p
but there's a repeating pattern to the relationships. Could I do something like:
MATCH p=(n {personID:3})(-[:CHILD]->()<-[:FATHER|MOTHER]-())* RETURN p
where the * means repeat the :CHILD then :FATHER|MOTHER relationships, with the directions being different? Obviously if the relationships were all the same direction, I could use
-[:CHILD|FATHER|MOTHER*]->
I want to be able to query it from Person #3 all the way to the top of the graph like a pedigree chart, but also be specific about how many levels if needed (like 3 generations, as opposed to end-of-line).
Another issue I'm having with this, is if I don't put directions on the relationships like -[:CHILD|FATHER|MOTHER*]-, then it will start at Person #3, and go both in the direction I want (alternating arrows), but also descend back down the chain finding all the other "cousins, aunts, uncles, etc.".
Any seasoned Cypher experts that an help me?
I am just on the same problem. And I found out that the APOC Expand path procedures are just accomplishing what you/we want.
Applied to your example, you could use apoc.path.subgraphNodes to get all ancestors of Person #3:
MATCH (p1:Person {personId:3})
CALL apoc.path.subgraphNodes(p1, {
sequence: '>Person,CHILD>,Family,<MOTHER|<FATHER'
}) YIELD node
RETURN node
Or if you want only ancestors up to the 3 generations from start person, add maxLevel: 6 to config (as one generation is defined by 2 relationships, 3 generations are 6 levels):
MATCH (p1:Person {personId:3})
CALL apoc.path.subgraphNodes(p1, {
sequence: '>Person,CHILD>,Family,<MOTHER|<FATHER',
maxLevel: 6
}) YIELD node
RETURN node
And if you want only ancestors of 3rd generation, i.e. only great-grandparents, you can also specify minLevel (using apoc.path.expandConfig):
MATCH (p1:Person {personId:3})
CALL apoc.path.expandConfig(p1, {
sequence: '>Person,CHILD>,Family,<MOTHER|<FATHER',
minLevel: 6,
maxLevel: 6
}) YIELD path
WITH last(nodes(path)) AS person
RETURN person
You could reverse the directionality of the CHILD relationships in your model, as in:
(p1:Person)<-[:CHILD]-(f:Family)<-[:FATHER|MOTHER]-(p2)
This way, you can use a simple -[:CHILD|FATHER|MOTHER*]-> pattern in your queries.
Reversing the directionality is actually intuitive as well, since you can then more naturally visualize the graph as a family tree, with all the arrows flowing "downwards" from ancestors to descendants.
Yeah, that's an interesting case. I'm pretty sure (though I'm open to correction) that this is just not possible. Would it be possible for you to have and maintain both? You could have a simple cypher query create the extra relationships:
MATCH (parent)-[:MOTHER|FATHER]->()<-[:CHILD]-(child)
CREATE (child)-[:CHILD_OF]->parent
Ok, so here's a thought:
MATCH path=(child:Person {personID: 3})-[:CHILD|FATHER|MOTHER*]-(ancestor:Person),
WHERE ancestor-[:MOTHER|FATHER]->()
RETURN path
Normally I'd use a second clause in the MATCH like this:
MATCH
path=(child:Person {personID: 3})-[:CHILD|FATHER|MOTHER*]-(ancestor:Person),
ancestor-[:MOTHER|FATHER]->()
RETURN path
But Neo4j (at least by default, I think) doesn't traverse back through the path. Maybe comma-separating would be fine and this would be a problem:
MATCH path=(child:Person {personID: 3})-[:CHILD|FATHER|MOTHER]-(ancestor:Person)-[:MOTHER|FATHER]->()
I'm curious to know what you find!

Finding minimum height of a binary tree

Please read my question before you report it as a duplicate. In the literature, to find the minimum height of a tree, the common approach is as follow:
int minDepth(TreeNode root) {
if (root == null) { return 0;}
return 1 + Math.min(minDepth(root.left), minDepth(root.right));
}
However, I think it does not distinguish between a leaf and a node with only one child and so it returns a wrong value. For example if our tree looks like this:
A is root
B is the left child of A
C is the right child of B
M is the left child of C
This function returns one while the leaf is 3 hop away from the root and so the min height is 4.
Since this recursive version is generally suggested in the literature, I think I am missing something.
Could somebody clear this for me?
Your comments indicate that the texts where you found this actually use the same definitions for the terms I use. If that is indeed the case, then the question is not why the algorithm you have shown is correct – it simply is wrong under those conditions.
Just take the third simplest binary tree, the one consisting of two nodes. It has exactly one leaf, its depth is two and its minimal depth is also two. But the algorithm you quoted returns the value one. So, unless the authors use a different definition for one of the terms (e.g., “minimal height” meaning “shortest path out of the tree”/“shortest path to a null pointer”), the result is simply wrong.

Resources