Is Control Flow Graph(CFG) subset of Abstract Syntax Tree (AST)? - abstract-syntax-tree

I know that Control Flow Graph(CFG) can be built from Abstract Syntax Tree (AST).
However, it is not clear to me whether CFG is a subset of AST?
As an example, given a CFG can we go back to AST?

However, it is not clear to me whether CFG is a subset of AST?
No, AST is an entirely syntactical construct. But a CST is well integrated with language's semantics. Also they represent different stuff.
As an example, given a CFG can we go back to AST?
It depends on the language, and how the constructs in that language maps to the control flow. For example these 2 snippets would construct entirely different ASTs;
print('a') if a else print('b')
if a:
print('a')
else:
print('b')
But the generated CFG would be equal for both of those.

Related

Is the "define" primitive of Scheme an imperative languages feature? Why or why not?

(define hypot
(lambda (a b)
(sqrt (+ (* a a) (* b b)))))
This is a Scheme programming language.
"define" creates a variable and global binding
lambda creates a procedure
I would like to know if "define" would be considered as an imperative language feature! As long as I know imperative feature is static scoping. I think it is an imperative feature since "define" create a global binding and static scoped looks at global binding for any variable definition where as in dynamic it looks at the current most active binding.
Please help me find the correct answer!! And I would like to know why or why not?
In a Scheme program (define var expr) statement is both a declaration and an initialization. Declarations introduce a new name into the scope. Declarations and initialization are present in both imperative and declarative languages.
However if the same variable is defined twice, then define behave as an assignment - which belongs to the imperative paradigm.
You've put your finger on a subtle and contentious issue. There have long been two informal camps on how define should work, which I would label (very imperfectly, and very controversially!) as the static vs. dynamic camps.
The static camp sees define as a non-side-effecting top-level declaration—it's a syntax that simply defines a name in a top-level scope, just like let is a syntax that defines a name in a local scope. A bit more precisely, this camp tends to see the top-level environment as equivalent to a big letrec with all the defines as the bindings, and all "loose" top-level expressions as the body. This is, incidentally, similar to the way that simple compilers work—read the whole program from one or more files, figure out all of the top-level bindings and generate code with knowledge of the whole program's source text.
The dynamic camp, on the other hand, tends to conceive of the top-level environment as a mutable data structure to which bindings can be added at runtime, and define is then an operation that modifies the top-level environment. This is, incidentally, similar to how simple interactive interpreters work—read definitions interactively from input, one at a time, and incorporate them into the environment as the user provides them.
To give one example, the SLIB library is one that I recall has been criticized for being much too firmly in the "dynamic" camp. If you read Section 1.1 on "features", you see this right from the beginning:
SLIB maintains a list of features supported by a Scheme session. The set of features provided by a session may change during that session.
The documentation for the require form that you use in SLIB to "load" modules continues with this:
Procedure: require feature
If (provided? feature) is true, then require just returns.
Otherwise, if feature is found in the catalog, then the corresponding files will be loaded and (provided? feature) will henceforth return #t. That feature is thereafter provided.
Otherwise (feature not found in the catalog), an error is signaled.
If you read this carefully, you will be struck that it's framing the whole thing as modules being "loaded" at runtime—and not as compile-time linking, which is foreign to the design.
So a "session" is a set of bindings whose keys—not just their values—changes during the runtime of the program. Programs are able to mutate the session with provide and require. They are able to directly observe the mutation with provided?. And it is implied that they can indirectly observe the set of identifiers bound in top-level environment change as a result of require—a call to require causes procedure invocations that would result in a runtime error before its invocation to no longer be so afterwards.
So we can't help but conclude that going by the philosophy of the people who designed this library, define is imperative. But not every Scheme user or implementer shares this philosophy.
First off Scheme is lexically scoped. Define usually is not limited to top level bindings like it is in Racket. It can create bindings within other procedure bodies.
In some implementations define can manipulate state but only for top level definitions. Otherwise it acts like let and binds a variable to the local scope. To actually take advantage of the top-level rebinding programatically is difficult.
So define doesn't introduce an imperative style into scheme code. Compare define to set! and its relatives, which by modify the variable in whatever environment it is bound, thereby allowing imperative style in scheme code.

Rationale behind Ada encapsulation of dynamically dispatching operations (primitives)

In Ada, Primitive operations of a type T can only be defined in the package where T is defined. For example, if a Vehicules package defines Car and Bike tagged record, both inheriting a common Vehicle abstract tagged type, then all operations than can dispatch on the class-wide Vehicle'Class type must be defined in this Vehicles package.
Let's say that you do not want to add primitive operations: you do not have the permission to edit the source file, or you do not want to clutter the package with unrelated features.
Then, you cannot define operations in other packages that implicitely dispatches on type Vehicle'Class.
For example, you may want to serialize vehicles (define a Vehicles_XML package with a To_Xml dispatching function) or display them as UI elements (define a Vehicles_GTK package with Get_Label, Get_Icon, ... dispatching functions), etc.
The only way to perform dynamic dispatch is to write the code explicitely; for example, inside Vechicle_XML:
if V in Car'Class then
return Car_XML (Car (V));
else
if V in Bike'Class then
return Bike_XML (Bike (V));
else
raise Constraint_Error
with "Vehicle_XML is only defined for Car and Bike."
end if;
(And a Visitor pattern defined in Vehicles and used elsewhere would work, of course, but that still requires the same kind of explicit dispatching code. edit in fact, no, but there is still some boilerplate code to write)
My question is then:
is there a reason why operations dynamically dispatching on T are restricted to be defined in the defining package of T?
Is this intentional? Is there some historical reasons behind this?
Thanks
EDIT:
Thanks for the current answers: basically, it seems that it is a matter of language implementation (freezing rules/virtual tables).
I agree that compilers are developped incrementally over time and that not all features fit nicely in an existing tool.
As such, isolating dispatching operators in a unique package seems to be a decision mostly guided by existing implementations than by language design. Other languages outside of the C++/Java family provide dynamic dispatch without such requirement (e.g. OCaml, Lisp (CLOS); if that matters, those are also compiled languages, or more precisely, language for which compilers exist).
When I asked this question, I wanted to know if there were more fundamental reasons, at language specification level, behind this part of Ada specifications (otherwise, does it really mean that the specification assumes/enforces a particular implementation of dynamic disapatch?)
Ideally, I am looking for an authoritative source, like a rationale or guideline section in Reference Manuals, or any kind of archived discussion about this specific part of the language.
I can think of several reasons:
(1) Your example has Car and Bike defined in the same package, both derived from Vehicles. However, that's not the "normal" use case, in my experience; it's more common to define each derived type in its own package. (Which I think is close to how "classes" are used in other compiled languages.) And note also that it's not uncommon to define new derived types afterwards. That's one of the whole points of object-oriented programming, to facilitate reuse; and it's a good thing if, when designing a new feature, you can find some existing type that you can derive from, and reuse its features.
So suppose you have your Vehicles package that defines Vehicle, Car, and Bike. Now in some other package V2, you want to define a new dispatching operation on a Vehicle. For this to work, you have to provide the overriding operations for Car and Bike, with their bodies; and assuming you are not allowed to modify Vehicles, then the language designers have to decide where the bodies of the new operation have to be. Presumably, you'd have to write them in V2. (One consequence is that the body that you write in V2 would not have access to the private part of Vehicles, and therefore it couldn't access implementation details of Car or Bike; so you could only write the body of that operation if terms of already-defined operations.) So then the question is: does V2 need to provide operations for all types that are derived from Vehicle? What about types derived from Vehicle that don't become part of the final program (maybe they're derived to be used in someone else's project)? What about types derived from Vehicle that haven't yet been defined (see preceding paragraph)? In theory, I suppose this could be made to work by checking everything at link time. However, that would be a major paradigm change for the language. It's not something that could be easily. (It's pretty common, by the way, for programmers to think "it would be nice to add feature X to a language, and it shouldn't be too hard because X is simple to talk about", without realizing just what a vast impact such a "simple" feature would have.)
(2) A practical reason has to do with how dispatching is implemented. Typically, it's done with a vector of procedure/function pointers. (I don't know for sure what the exact implementation is in all cases, but I think this is basically the case for every Ada compiler as well as for C++ and Java compilers, and probably C#.) What this means is that when you define a tagged type (or a class, in other languages), the compiler will set up a vector of pointers, and based on how many operations are defined for the type, say N, it will reserve slots 1..N in the vector for the addresses of the subprograms. If a type is derived from that type and defines overriding subprograms, the derived type gets its own vector, where slots 1..N will be pointers to the actual overriding subprograms. Then, when calling a dispatching subprogram, a program can look up the address in some known slot index assigned to that subprogram, and it will jump to the correct address depending on the object's actual type. If a derived type defines new primitive subprograms, new slots are assigned N+1..N2, and types derived from that could define new subprograms that get slots N2+1..N3, and so on.
Adding new dispatching subprograms to Vehicle would interfere with this. Since new types have been derived from Vehicle, you can't insert a new area into the vector after N, because code has already been generated that assumes the slots starting at N+1 have been assigned to new operations derived for derived types. And since we may not know all the types that have been derived from Vehicle and we don't know what other types will be derived from Vehicle in the future and how many new operations will be defined for them, it's hard to pick some other location in the vector that could be used for the new operations. Again, this could be done if all of the slot assignment were deferred until link time, but that would be a major paradigm change, again.
To be honest, I can think of other ways to make this work, by adding new operations not in the "main" dispatch vector but in an auxiliary one; dispatching would probably require a search for the correct vector (perhaps using an ID assigned to the package that defines the new operations). Also, adding interface types to Ada 2005 has already complicated the simple vector implementation somewhat. But I do think this (i.e. it doesn't fit into the model) is one reason why the ability to add new dispatching operations like you suggest isn't present in Ada (or in any other compiled language that I know of).
Without having checked the rationale for Ada 95 (where tagged types were introduced), I am pretty sure the freezing rules for tagged types are derived from the simple requirement that all objects in T'Class should have all the dispatching operations of type T.
To fulfill that requirement, you have to freeze type and say that no more dispatching operations can be added to type T once you:
Derive a type from T, or
Are at the end of the package specification where T was declared.
If you didn't do that, you could have a type derived from type T (i.e. in T'Class), which hadn't inherited all the dispatching operations of type T. If you passed an object of that type as a T'Class parameter to a subprogram, which knew of one more dispatching operation on type T, a call to that operation would have to fail. - We wouldn't want that to happen.
Answering your extended question:
Ada comes with both a Reference Manual (the ISO standard), a Rationale and an Annotated Reference Manual. And a large part of the discussions behind these documents are public as well.
For Ada 2012 see http://www.adaic.org/ada-resources/standards/ada12/
Tagged types (dynamic dispatching) was introduced in Ada 95. The documents related to that version of the standard can be found at http://www.adaic.org/ada-resources/standards/ada-95-documents/

What is the difference between self-modifying code and reflection?

Self-modifying code is code that "alters its own instructions while it is executing". This is not typically done outside of assembly language or viruses.
Reflection is just the ability of a program to access its own namespace dynamically, in order to reference functions and classes and variables dynamically. According to this article, reflection is not just introspection (a program's ability to examine itself), but also intercession (a program's ability to modify itself).
So, is the difference that reflection refers to a mild form of self-modifying code where only the variable/class/function name gets "modified" within the instructions? That is, reflection is a milder, less "dramatic" form of modification compared with the ability to modify the nature of the entire instruction itself as in self-modifying code.
Do I have this distinction right?
No, one is about changing the code during execution. The other about reading the structure and metadata (introspection) of the code during execution.
They can be mutually exclusive. You don't need to know how the code is to modify it (if the OS allows you that is).
Typically you can use reflection to execute code in a non 'normal use case' manner, but it's still the same code. contrast this to modifying the code.
The goals are completely non aligned.
However I suppose one example where they intersect in a small way is to consider a function (F) that calls two other functions - A then B. You could reflect that knowledge and then call B then A (thus modifying the use case of (F)). as you can see it's not not modifying code, rather just the intended use case.

Is there a programming language that meets these requirements?

I have a few requirements before implementing my next program - hopefully a programming language exists that can do the following:
Given a class (or interface) C, the programming language allows the user access to a list of all classes which extend/implement C.
The programming language allows the user to iterate through all the variables and methods of a class.
The user is able to determine the number and types of arguments a function will take.
eg. foo(int a, String b, int c) can be queried
to return 3 or [int, String, int]
Are these absurd requirements or does some language implement them as basic techniques of reflection?
I know that you prefer a statically-typed language, but if you consider a dynamically-typed one Smalltalk may be a good fit, since everything is an object (classes and methods are no exception ot this rule) and thus everything can be manipulated (not only queried, but also changed). Going to your requirements:
Given a class (or interface) C, the programming language allows the
user access to a list of all classes which extend/implement C.
In Smalltalk there is no built-in notion of interface (though I think that I've seen extensions that added support for it). However, you can:
Given a class, find it direct subclasses: Number subclasses answers {Fraction. Float. Integer}.
Or all the hierarchy under it: Number allSubclasses answers an OrderedCollection(Fraction Float Integer ScaledDecimal SmallInteger LargePositiveInteger LargeNegativeInteger)
You can also find all classes that implement a given selector (pop in this case):
SystemNavigation default allClassesImplementing: #pop answers {ContextPart. FileSystemGuide. LIFOQueue. Stack}
As you can see, defining an "Interface" object to query for classes that implement a set of methods is quite easy (just have a collection of method names and query for classes implementing each of them, adding the classes to a set). However if you want to explicitly state in the class that it implements an interface, then you'll need to do more work.
The programming language allows the user to iterate through all the
variables and methods of a class.
Point instVarNames answers #('x' 'y')
Point allMethods answers a collection of CompiledMethods (the object that represents a method)
Point allSelectors answers a collection of all the method names that an instance of that class can answer to.
The user is able to determine the number and types of arguments a
function will take.
In this case you interact with compiled methods and ask them for the number of arguments they need (there is no notion of parameter type):
(Point methodNamed: #x) numArgs answers 0, since it is just a getter.
(Point methodNamed: #+) numArgs answers 1
This is just a small preview of the reflective capabilities of Smalltalk; if you want to go deeper you can check out some of these links:
Reflective Facilities in Smalltalk-80
Evaluating message passing control techniques in Smalltalk
Debugging Objects
HTH
I would expect most Lisp systems (Scheme, CommonLisp, ...) to meet these requirements.
Java can do this. Yet, note that no language will do:
Given a class (or interface) C, the programming language allows the
user access to a list of all classes which extend/implement C
The reason is that the number of classes that extend C is either zero (in case of final classes) or infinite. In the latter case, which is the norm, only a tiny portion of all classes that extend C has actually been written down and compiled, and only those you can access.

Most common pattern for using a database in a functional language, given desire for no side-effects?

I'm trying to get my head around a core concept of functional langauges:
"A central concept in functional languages is that the result of a function is determined by its input, and only by its input. There are no side-effects!"
http://www.haskell.org/haskellwiki/Why_Haskell_matters#Functions_and_side-effects_in_functional_languages
My question is, if a function makes changes only within its local environment, and returns the result, how can it interact with a database or a file system? By definition, wouldn't that be accessing what is in effect a global variable or global state?
What is the most common pattern used to get around or address this?
Just because a functional language is functional (Maybe even completely pure like Haskell!), it doesn't mean that programs written in that language must be pure when ran.
Haskell's approach, for example, when dealing with side-effects, can be explained rather simply: Let the whole program itself be pure (meaning that functions always return the same values for the same arguments and don't have any side effect), but let the return value of the main function be an action that can be ran.
Trying to explain this with pseudocode, here is some program in an imperative, non-functional language:
main:
read contents of abc.txt into mystring
write contents of mystring to def.txt
The main procedure above is just that: a series of steps describing how to perform a series of actions.
Compare this to a purely functional language like Haskell. In functional languages, everything is an expression, including the main function. One can thus read the equivalent of the above program like this:
main = the reading of abc.txt into mystring followed by
the writing of mystring to def.txt
So, main is an expression that, when evaluated, will return an action describing what to do in order to execute the program. The actual execution of this action happens outside of the programmers world. And this is really how it works; the following is an actual Haskell program that can be compiled and ran:
main = readFile "abc.txt" >>= \ mystring ->
writeFile "def.txt" mystring
a >>= b can be said to mean "the action a followed by the result of a given to action b" in this situation, and the result of the operator is the combined actions a and b. The above program is of course not idiomatic Haskell; one can rewrite it as follows (removing the superfluous variable):
main = readFile "abc.txt" >>=
writeFile "def.txt"
...or, using syntactic sugar and the do-notation:
main = do
mystring <- readFile "abc.txt"
writeFile "def.txt" mystring
All of the above programs are not only equivalent, but they are identical as far as the compiler is concerned.
This is how files, database systems, and web servers can be written as purely functional programs: by threading action values through the program so that they are combined, and finally end up in the main function. This gives the programmer enormous control over the program, and is why purely functional programming languages are so appealing in some situations.
The most common pattern for dealing with side-effects and impurity in functional languages is:
be pragmatic, not a purist
provide built-ins that allow impure code and side-effects
use them as little as possible!
Examples:
Lisp/Scheme: set!
Clojure: refs, and using mutating methods on java objects
Scala: creating variables with var
ML: not sure of specifics, but Wikipedia says it allows some impurity
Haskell cheats a little bit -- its solution is that for functions that access the file system, or the database, the state of the entire universe at that instant, including the state of the filesystem/db, will be passed in to the function.(1) Thus, if you can replicate the state of the entire universe at that instant, then you can get the same results twice from such a function. Of course, you can't replicate the state of the entire universe at that instant, and so the functions return different values ...
But Haskell's solution, IMHO, is not the most common.
(1) Not sure of the specifics here. Thanks to CAMcCann for pointing out that this metaphor is overused and maybe not all that accurate.
Acessing a database is no different from other cases of input-output, such as print(17).
In eagerly evaluated languages, like LISP and ML the usual approach to effectful programming is just using side effects, like in most other programming languages.
In Haskell, the solution for the IO problem is using monads. For example, if you check HDBC, a haskell database library, you can see lots of functions there return IO actions.
Some languages, like Clean, use uniqueness-types to enforce the same kind of sequentiality Haskell does with monads but these languages are harder to find nowadays.

Resources