Short implementation examples of abstract interpretation - abstract-interpretation

I am taking a course on abstract interpretation, but I haven't seen any examples of how the theory maps down to actual code.
I am looking for short code examples, where I preferably won't have to work with a whole compiler. The analysis doesn't have to be useful, I would just like to see an example where the analysis is derived and then implemented.
Does anyone know of any such examples, perhaps from a university course?

AI is based on a mathematic theory name Galois Connection. The theory is very simple:
Abstract the behaviour of the program.
Perform the analysis on the abstract level.
Galois connection: To relate the Actual and Abstract program.
This is the best tutorial I have seen so far about Abstract Interpretation:

There is this paper by Bertot
Structural abstract interpretation, A formal study using Coq
That gives a full implementation of an abstract interpreter for a simple toy language using the Coq Proof Assistant. I used this for a concrete reference, and found it useful, although a little hard going, which is to be expected given the subject matter. Coq is a great little piece of software.
I also came across in a Cousot paper:
A gentle introduction to formal verification of computer systems by abstract interpretation
rough details (but I am sure there will be useful citations for full details) of an implementation in Astrée, I am not familiar with Astrée, so didn't actually read that section, but I think it meets your criteria.
If you come across anymore, please let me know! Would especially like to see a prolog abstract interpreter.

Maybe this tool is also interesting for you:
Interproc Analyzer
It is an abstract analyzer for a very simple language, which however offers
interprocedural analyses. You can try out the analysis and get numerical invariants about the analyzed program. The source code is available (OCaml).
A really thorough and precise course, given by one of the "creators" of Abstract Interpretation, Patrick Cousot (already mentioned in one of the answers):
MIT course about Abstract Interpretation. The course also offers assignments, in OCaml.

There is MonoREIL, which comes with the recently open sourced tool BinNavi.
See here is a short intro.
Note that the context of the MonoREIL framework is not compilers but the analysis of binary code. Yet, it has been used for real world applications, see slide 34 ff of this introduction (which contains more formal background).

Related

An example of pratical application of Isabelle/HOL

I have looked into the Isabelle tutorial which presents an example of it's use in verifying security protocol. However, it is a bit out of my understanding as I only know the basics. I'm looking for some examples which are not just simple theorems but practical applications using Isabelle/HOL.
For example proving some algorithms or may be verifying properties system or some non-trivial mathematical theorem. Are such examples available anywhere ?
I have looked into the list of all applications provided in the isabelle official page but most of them are proofs of theorems.
I am also looking at an example of a file system verification using Alloy. It provides a proof where the properties of file/directories can be verified. I'm looking for something similar to it.
A few highly non-trivial examples I can think of right now are:
seL4, an entire operating system kernel written in C that was verified with Isabelle.
The AFP entry Jinja_Threads contains, as far as I know, a fully formalised bytecode compiler for a Java-like language with arrays and threads.
Jeremy Avigad's proof of the Prime Number Theorem.
The proof of Kepler's conjecture. A part of this was done in Isabelle; most of it, however, was done in the more ‘basic’ theorem prover HOL Light, whose logic is similar to Isabelle.
As Joachim mentioned, I am sure you can find more interesting applications in the AFP

Prerequisite knowledge for understanding definition of standard ml

could anyone tell me what's the background theories in The Definition of Standard ML, found very interesting and beautiful, i did learn some sml a little, but i want more while don't know how to start (to understand TDSML)
3x in advance
For the old version of the Definition (SML'90) there actually was a separate book called "Commentary on Standard ML", which explained how to interpret the Definition. Both the SML'90 Definition and the Commentary are long out of print, but fortunately, are available as free PDFs.
The SML'90 Definition had some differences to SML'97, in particular regarding the module system. Overall, it was more complicated. But much of the Commentary should still apply, and if you have both versions side by side, it shouldn't be hard to figure out what's still relevant.
This is off the top of my head: In order to understand the methods employed in The Definition of Standard ML, you should some basic understanding of:
set theory
functions
first-order logic
type theory
Additionally, you should be able to read and understand inference rules of which the book makes extensive use.

Technical choices in unmarshalling hash-consed data

There seems to be quite a bit of folklore knowledge floating about in restricted circles about the pitfalls of hash-consing combined with marshalling-unmarshalling of data. I am looking for citable references to these tidbits.
For instance, someone once pointed me to library aterm and mentioned that the authors had clearly thought about this and that the representation on disk was bottom-up (children of a node come before the node itself in the data stream). This is indeed the right way to do things when you need to re-share each node (with a possible identical node already in memory). This re-sharing pass needs to be done bottom-up, so the unmarshalling itself might as well be, too, so that it's possible to do everything in a single pass.
I am in the process of describing difficulties encountered in our own context, and the solutions we found. I would appreciate any citable reference to the kind of aforementioned folklore knowledge. Some people obviously have encountered the problems before (the aterm library is only one example). But I didn't find anything in writing. Even the little piece of information I have about aterm is hear-say. I am not worried it's not reliable (you can't make this up), but "personal communication" and "look how it's done in the source code" are considered poor form in citations.
I have enough references on hash-consing alone. I am only interested in references where it interferes with other aspects of programming, such as marshalling or distribution.
OK, this is not much more use, but Andrew Kennedy wrote a functional pearl called simply Pickling Combinators, which appears in the Journal of Functional Programming, (2004), 14:6:727-739. There is extensive discussion of structure sharing and how it is handled in pickles, but no direct discussion of how this problem might relate to hash-consing in the implementation of the language. But the article does discuss structure sharing in memory as well as in a pickle, so I hope it is better than nothing.
Martin Elsman had a follow-on paper in 2005 in Trends in Functional Programming; the title is Type-specialized serialization with sharing. The article deals primarily with hash-consing by the unpickler (deserializer), not with hash-consing in the impelementation, but again it may be worth something.
The JFP paper is proprietary, but there appears to be a preprint on Andrew's web page.
Elsman's paper appears to be available through Google Scholar at http://tinyurl.com/yd5tw2b.
(In a previous life, I worked on a project to create ASCII pickles that people could read and edit. I stupidly failed to publish it, but I have retained an interest.)
I found one reference on marshalling in functional languages; not sure if it will be useful, but the authors are smart: http://tinyurl.com/yc3hob9
I believe that Matthias Blume and/or Andrew Appel did something on this, but I can't find the paper. I also believe I reviewed something once for the Journal of Functional Programming, but I can't remember if the paper was accepted or who wrote it.
I suggest you ask Matthias Blume, Andrew Appel, and Phil Wadler if they can help.
Coq V5.10 had hash-consing and marshaling/unmarshaling. I didn't find anything in published form but the unmarshaling steps would be referenced as "reinterning" in the source code. Coq unmarhsaled values and then traversed them in order to re-create sharing, the obvious and only solution when all the language provides is an unmarshal function of type int_channel -> 'a.

Mathematical notation of programming concepts

There are many methods for representing structure of a program (like UML class diagrams etc.). I am interested if there is a convention which describes programs in a strict, mathematical way. I am especially interested in the use of mathematical notation for this purpose.
An example: Classes are represented as sets (fields, properties) and functions (operating on the elements of sets). A parent class' fields are a subset of child class'. Functions are described in pseudocode which has to look like this and that...
I know that Z Notation has been used to some extent in the formal verification of software, such as the Tokeneer project.
Z Notation
Z Reference Manual
http://www.amazon.com/Concrete-Mathematics-Foundation-Computer-Science/dp/0201558025
Yes, there is, Floyd-Hoare Logic.
There are a lot of way, but i think most of them are inconvenient for expressing the structure since the structure is often not expressable in default mathematical concepts. The main exception is of course functional programing languages. Think about folds (catamorphisme), groups, algebra's etc.
For imperative programming I know of the existence of Z, which uses (pure and extended) lambda calculus set theory and (first order) predicate logic. However, i dont think it's very convenient. The only upside of using mathematics to express structure is the fact that you can prove stuff about it. But if you want to do that, take a look at JML, Spec# or Eiffel.
Depends on what you're trying to accomplish, but going down this road with specific languages can get you into trouble.
For example, see the circle-ellipse discussion on C++ FAQ Lite.
This book applies the deductive method
to programming by affiliating programs
with the abstract mathematical
theories that enable them work. [...]
I believe that Elements of Programming by Alexander Stepanov and Paul McJones, is pretty close to what you are looking for.
Concepts
A concept is a description of
requirements on one or more types
stated in terms of the existence and
properties of procedures, type
attributes, and type functions defined
on the types.
Z, which has already been mentioned, is pretty much what you describe. There are some variants of it for object-oriented modelling, but I think you can get quite far with "standard Z's" schemas if you wish to model classes.
There's also Alloy, which is newer and inspired by Z. Its notation is perhaps a bit closer to object-orientation. It is also analysable, i.e. you can check the models you create whether they fulfill certain conditions, but it cannot prove that properties hold, just attempt to refute within a finite scope.
The article Dependable Software by Design is a nice introduction to Alloy and its ilk, along with a table of available similar tools.
You are looking for functional programming. There are several functional programming languages, and they are all based on a fundamental mathematical theory called the Lambda calculus. Programs written in a functional programming language such as LISP are a mathematical representation of themselves. ;-)
There is a mathematical language which actually describes a program or rather it's operations. You take the initial state and then transform this state until you reach the desired target state. The transformations yield the program code which must be executed.
See the Wikipedia article about Hoare logic.
The basic idea is that for every function (no matter if you put that into a class or into an old style function), you have a pre- and a post-condition. For example, the precondition can be that you have an array which has >= 0 elements. the post-condition is that every element[i] must by <= element[j] for every i <= j.
The usual description would be "the function sorts the array". But the mathematical terms allow you to transform the input (which must match the precondition) into the output (which must match the postcondition).
It's a bit unwieldy to use, especially for more complex programs but some of the examples are pretty impressive. Often, you get really compact code as the result which looks quite complex but works at first try.
I'd like to suggest Algebra of Programming. It's a calculational approach to programs, using Relational Algebra, and Galois Connections.
If you have further interest on this topic, you can find an amazing paper, here, by Shin-Cheng Mu, and José Nuno Oliveira (slides).
Using Relational Algebra and First-Order Logic, also has a nice synergy with Alloy, Functional Programming, and Design by Contract (easily applied to Object-Oriented Programming).

Concepts that surprised you when you read SICP?

SICP - "Structure and Interpretation of Computer Programs"
Explanation for the same would be nice
Can some one explain about Metalinguistic Abstraction
SICP really drove home the point that it is possible to look at code and data as the same thing.
I understood this before when thinking about universal Turing machines (the input to a UTM is just a representation of a program) or the von Neumann architecture (where a single storage structure holds both code and data), but SICP made the idea much more clear. Scheme (Lisp) helped here, as the syntax for a program is exactly the same as the syntax for lists in general, namely S-expressions.
Once you have the "equivalence" of code and data, suddenly a lot of things become easy. For example, you can write programs that have different evaluation methods (lazy, nondeterministic, etc). Previously, I might have thought that this would require an extension to the programming language; in reality, I can just add it on to the language myself, thus allowing the core language to be minimal. As another example, you can similarly implement an object-oriented framework; again, this is something I might have naively thought would require modifying the language.
Incidentally, one thing I wish SICP had mentioned more: types. Type checking at compilation time is an amazing thing. The SICP implementation of object-oriented programming did not have this benefit.
I didn't read that book yet, I have only looked at the video courses, but it taught me a lot. Functions as first class citizens was mind blowing for me. Executing a "variable" was something very new to me. After watching those videos the way I now see JavaScript and programming in general has greatly changed.
Oh, I think I've lied, the thing that really struck me was that + was a function.
I think the most surprising thing about SICP is to see how few primitives are actually required to make a Turing complete language--almost anything can be built from almost nothing.
Since we are discussing SICP, I'll put in my standard plug for the video lectures at http://groups.csail.mit.edu/mac/classes/6.001/abelson-sussman-lectures/, which are the best Introduction to Computer Science you could hope to get in 20 hours.
The one that I thought was really cool was streams with delayed evaluation. The one about generating primes was something I thought was really neat. Like a "PEZ" dispenser that magically dispenses the next prime in the sequence.
One example of "the data and the code are the same thing" from A. Rex's answer got me in a very deep way.
When I was taught Lisp back in Russia, our teachers told us that the language was about lists: car, cdr, cons. What really amazed me was the fact that you don't need those functions at all - you can write your own, given closures. So, Lisp is not about lists after all! That was a big surprise.
A concept I was completely unfamiliar with was the idea of coroutines, i.e. having two functions doing complementary work and having the program flow control alternate between them.
I was still in high school when I read SICP, and I had focused on the first and second chapters. For me at the time, I liked that you could express all those mathematical ideas in code, and have the computer do most of the dirty work.
When I was tutoring SICP, I got impressed by different aspects. For one, the conundrum that data and code are really the same thing, because code is executable data. The chapter on metalinguistic abstractions is mind-boggling to many and has many take-home messages. The first is that all the rules are arbitrary. This bothers some students, specially those who are physicists at heart. I think the beauty is not in the rules themselves, but in studying the consequence of the rules. A one-line change in code can mean the difference between lexical scoping and dynamic scoping.
Today, though SICP is still fun and insightful to many, I do understand that it's becoming dated. For one, it doesn't teach debugging skills and tools (I include type systems in there), which is essential for working in today's gigantic systems.
I was most surprised of how easy it is to implement languages. That one could write interpreter for Scheme onto a blackboard.
I felt Recursion in different sense after reading some of the chapters of SICP
I am right now on Section "Sequences as Conventional Interfaces" and have found the concept of procedures as first class citizens quite fascinating. Also, the application of recursion is something I have never seen in any language.
Closures.
Coming from a primarily imperative background (Java, C#, etc. -- I only read SICP a year or so ago for the first time, and am re-reading it now), thinking in functional terms was a big revelation for me; it totally changed the way I think about my work today.
I read most part of the book (without exercise). What I have learned is how to abstract the real world at a specific level, and how to implement a language.
Each chapter has ideas surprise me:
The first two chapters show me two ways of abstracting the real world: abstraction with the procedure, and abstraction with data.
Chapter 3 introduces time in the real world. That results in states. We try assignment, which raises problems. Then we try streams.
Chapter 4 is about metalinguistic abstraction, in other words, we implement a new language by constructing an evaluator, which determines the meaning of expressions.
Since the evaluator in Chapter 4 is itself a Lisp program, it inherits the control structure of the underlying Lisp system. So in Chapter 5, we dive into the step-by-step operation of a real computer with the help of an abstract model, register machine.
Thanks.

Resources