Extracting BNF data in Isabelle

Extracting BNF data in Isabelle - isabelle

I'm studying Isabelle's encoding of (co)datatypes. I was wondering if there is a way of defining a datatype, say:
datatype 'a tree = Node 'a ('a tree fset)
and then inspecting the BNF it generates.

You can use the command print_bnfs. Moreover, of course, you can see all generated theorems by using print_theorems immediately after the definition of the datatype. Perhaps, if you need further insight, you could try exploring the ML infrastructure associated with the BNFs.
As a side note, it is possible to see a list of all available commands using the command print_commands.

Related

What's a cterm?

The Isabelle implementation manual says:
Types ctyp and cterm represent certified types and terms, respectively. These are abstract datatypes that guarantee that its values have passed the full well-formedness (and well-typedness) checks, relative to the declarations of type constructors, constants etc. in the background theory.
My understanding is: when I write cterm t, Isabelle checks that the term is well-built according to the theory where it lives in.
The abstract types ctyp and cterm are part of the same inference kernel that is mainly responsible for thm. Thus syntactic operations on ctyp and cterm are located in the Thm module, even though theorems
are not yet involved at that stage.
My understanding is: if I want to modify a cterm at the ML level I will use operations of the Thm module (where can I find that module?)
Furthermore, it looks like cterm t is an entity that converts a term of the theory level to a term of the ML-level. So I inspect the code of cterm in the declaration:
ML_val ‹
some_simproc #{context} #{cterm "some_term"}
›
and get to ml_antiquotations.ML:
ML_Antiquotation.value \<^binding>‹cterm› (Args.term >> (fn t =>
"Thm.cterm_of ML_context " ^ ML_Syntax.atomic (ML_Syntax.print_term t))) #>
This line of code is unreadable to me with my current knowledge.
I wonder if someone could give a better low-level explanation of cterm. What is the meaning of the code below? Where are located the checks that cterm performs on theory terms? Where are located the manipulations that we can do on cterms (the module Thm above)?

The ‘c’ stands for ‘certified’ (Or ‘checked’? Not sure). A cterm is basically a term that has been undergone checking. The #{cterm …} antiquotation allows you to simply write down terms and directly get a cterm in various contexts (in this case probably the context of ML, i.e. you directly get a cterm value with the intended content). The same works for regular terms, i.e. #{term …}.
You can manipulate cterms directly using the functions from the Thm structure (which, incidentally, can be found in ~~/src/Pure/thm.ML; most of these basic ML files are in the Pure directory). However, in my experience, it is usually easier to just convert the cterm to a regular term (using Thm.term_of – unlike Thm.cterm_of, this is a very cheap operation) and then work with the term instead. Directly manipulating cterms only really makes sense if you need another cterm in the end, because re-certifying terms is fairly expensive (still, unless your code is called very often, it probably isn't really a performance problem).
In most cases, I would say the workflow is like this: If you get a cterm as an input, you turn it into a regular term. This you can easily inspect/take apart/whatever. At some point, you might have to turn it into a cterm again (e.g. because you want to instantiate some theorem with it or use it in some other way that involves the kernel) and then you just use Thm.cterm_of to do that.
I don't know exactly what the #{cterm …} antiquotation does internally, but I would imagine that at the end of the day, it just parses its parameter as an Isabelle term and then certifies it with the current context (i.e. #{context}) using something like Thm.cterm_of.

To gather my findings with cterms, I post an answer.
This is how a cterm looks like in Pure:
abstype cterm =
Cterm of {cert: Context.certificate,
t: term, T: typ,
maxidx: int,
sorts: sort Ord_List.T}
(To be continued)

thy_goal_defn and keywords in Isabelle/HOL

I'm trying to get a good understanding of how math is built up in Isabelle. For whatever reason, all the tutorial/manuals hide a lot of the implementation details of basic types such as how the natural numbers, integers, rationals, and reals are constructed. When looking the src/HOL directory and examining the .thy files, I've encountered code blocks such as:
keywords
"print_quotmapsQ3" "print_quotientsQ3" "print_quotconsts" :: diag and
"quotient_type" :: thy_goal_defn and "/" and
"quotient_definition" :: thy_goal_defn
begin
in Quotient.thy. Here, keywords is being used so that later you can define a type as:
quotient_type rat = "int * int" / partial: "ratrel"
and other related definitions. I haven't been able to figure out how the "keywords" feature works. It's not particularly obvious from the code, and the only documentation I can find is in the Isabelle/Isar Reference Manual where the following is written:
"The keywords specification declares outer syntax (chapter 3) that is introduced in this theory later on (rare in end-user applications). Both minor keywords and major keywords of the Isar command lan- guage need to be specified, in order to make parsing of proof docu- ments work properly. Command keywords need to be classified ac- cording to their structural role in the formal text. Examples may be seen in Isabelle/HOL sources itself, such as keywords "typedef" :: thy_goal_defn or keywords "datatype" :: thy_defn for theory-level definitions with and without proof, respectively." (p. 91)
This raises the question what a theory-level definition, which I haven't been able to figure out.

Isabelle's surface language, Isar, is extensible in multiple dimensions. In particular, a significant chunk of the keywords you'd usually use in day-to-day formalizations are defined in userspace. This sets Isar apart from many other programming languages, where syntax is fixed.
Roughly speaking, a theory file in Isabelle consists of two parts:
The header, which can be parsed statically, i.e. without running any custom code.
The contents, where e.g. logical definitions (types, constants, ...) and proofs can be mode.
Parsing of the contents happens in two phases:
First, the command structure is being parsed. This can be done by looking at the table of all the keywords that exist (those are declared in the header). There are various different types of keywords (as the manual points out). Command keywords start a new atomic chunk in the theory. (Consequently, theory contents can be seen as a sequence of commands.)
Second, the commands themselves are parsed, by using custom parsing code specified by whoever declared the corresponding keyword. This will execute any action, e.g. actually defining a type in the theory when the keyword typedef is encountered.
Commands can be classified according to the context in which they can appear. Top-level commands may only appear – well – on the top-level of a theory. Other commands may be freely nested in local contexts. Yet other commands do not modify the theory, but only print diagnostic output (diag). Theory processing in Isabelle takes that into account when e.g. parallelizing execution of a theory.
The example you mentioned, thy_goal_defn, is a keyword that adds some definitions to the theory and also enters proof mode, because quotient_type requires some proofs about the wellformedness of the definition.

Reflecting on a Type parameter

I am trying to create a function
import Language.Reflection
foo : Type -> TT
I tried it by using the reflect tactic:
foo = proof
{
intro t
reflect t
}
but this reflects on the variable t itself:
*SOQuestion> foo
\t => P Bound (UN "t") (TType (UVar 41)) : Type -> TT

Reflection in Idris is a purely syntactic, compile-time only feature. To predict how it will work, you need to know about how Idris converts your program to its core language. Importantly, you won't be able to get ahold of reflected terms at runtime and reconstruct them like you would with Lisp. Here's how your program is compiled:
Internally, Idris creates a hole that will expect something of type Type -> TT.
It runs the proof script for foo in this state. We start with no assumptions and a goal of type Type -> TT. That is, there's a term being constructed which looks like ?rhs : Type => TT . rhs. The ?foo : ty => body syntax shows that there's a hole called foo whose eventual value will be available inside of body.
The step intro t creates a function whose argument is t : Type - this means that we now have a term like ?foo_body : TT . \t : Type => foo_body.
The reflect t step then fills the current hole by taking the term on its right-hand side and converting it to a TT. That term is in fact just a reference to the argument of the function, so you get the variable t. reflect, like all other proof script steps, only has access to the information that is available directly at compile time. Thus, the result of filling in foo_body with the reflection of the term t is P Bound (UN "t") (TType (UVar (-1))).
If you could do what you are wanting here, it would have major consequences both for understanding Idris code and for running it efficiently.
The loss in understanding would come from the inability to use parametricity to reason about the behavior of functions based on their types. All functions would effectively become potentially ad-hoc polymorphic, because they could (say) run differently on lists of strings than on lists of ints.
The loss in performance would come from representing enough type information to do the reflection. After Idris code is compiled, there is no type information left in it (unlike in a system such as the JVM or .NET or a dynamically typed system such as Python, where types have a runtime representation that code can access). In Idris, types can be very large, because they can contain arbitrary programs - this means that far more information would have to be maintained, and computation occurring at the type level would also have to be preserved and repeated at runtime.
If you're wanting to reflect on the structure of a type for further proof automation at compile time, take a look at the applyTactic tactic. Its argument should be a function that takes a reflected context and goal and gives back a new reflected tactic script. An example can be seen in the Data.Vect source.
So I suppose the summary is that Idris can't do what you want, and it probably never will be able to, but you might be able to make progress another way.

Haskell "collections" language design

Why is the Haskell implementation so focused on linked lists?
For example, I know Data.Sequence is more efficient
with most of the list operations (except for the cons operation), and is used a lot;
syntactically, though, it is "hardly supported". Haskell has put a lot of effort into functional abstractions, such as the Functor and the Foldable class, but their syntax is not compatible with that of the default list.
If, in a project I want to optimize and replace my lists with sequences - or if I suddenly want support for infinite collections, and replace my sequences with lists - the resulting code changes are abhorrent.
So I guess my wondering can be made concrete in questions such as:
Why isn't the type of map equal to (Functor f) => (a -> b) -> f a -> f b?
Why can't the [] and (:) functions be used for, for example, the type in Data.Sequence?
I am really hoping there is some explanation for this, that doesn't include the words "backwards compatibility" or "it just grew that way", though if you think there isn't, please let me know. Any relevant language extensions are welcome as well.

Before getting into why, here's a summary of the problem and what you can do about it. The constructors [] and (:) are reserved for lists and cannot be redefined. If you plan to use the same code with multiple data types, then define or choose a type class representing the interface you want to support, and use methods from that class.
Here are some generalized functions that work on both lists and sequences. I don't know of a generalization of (:), but you could write your own.
fmap instead of map
mempty instead of []
mappend instead of (++)
If you plan to do a one-off data type replacement, then you can define your own names for things, and redefine them later.
-- For now, use lists
type List a = [a]
nil = []
cons x xs = x : xs
{- Switch to Seq in the future
-- type List a = Seq a
-- nil = empty
-- cons x xs = x <| xs
-}
Note that [] and (:) are constructors: you can also use them for pattern matching. Pattern matching is specific to one type constructor, so you can't extend a pattern to work on a new data type without rewriting the pattern-matchign code.
Why there's so much list-specific stuff in Haskell
Lists are commonly used to represent sequential computations, rather than data. In an imperative language, you might build a Set with a loop that creates elements and inserts them into the set one by one. In Haskell, you do the same thing by creating a list and then passing the list to Set.fromList. Since lists so closely match this abstraction of computation, they have a place that's unlikely to ever be superseded by another data structure.
The fact remains that some functions are list-specific when they could have been generic. Some common functions like map were made list-specific so that new users would have less to learn. In particular, they provide simpler and (it was decided) more understandable error messages. Since it's possible to use generic functions instead, the problem is really just a syntactic inconvenience. It's worth noting that Haskell language implementations have very little list-speficic code, so new data structures and methods can be just as efficient as the "built-in" ones.
There are several classes that are useful generalizations of lists:
Functor supplies fmap, a generalization of map.
Monoid supplies methods useful for collections with list-like structure. The empty list [] is generalized to other containers by mempty, and list concatenation (++) is generalized to other containers by mappend.
Applicative and Monad supply methods that are useful for interpreting collections as computations.
Traversable and Foldable supply useful methods for running computations over collections.
Of these, only Functor and Monad were in the influential Haskell 98 spec, so the others have been overlooked to varying degrees by library writers, depending on when the library was written and how actively it was maintained. The core libraries have been good about supporting new interfaces.

I remember reading somewhere that map is for lists by default since newcomers to Haskell would be put off if they made a mistake and saw a complex error about "Functors", which they have no idea about. Therefore, they have both map and fmap instead of just map.
EDIT: That "somewhere" is the Monad Reader Issue 13, page 20, footnote 3:
3You might ask why we need a separate map function. Why not just do away with the current
list-only map function, and rename fmap to map instead? Well, that’s a good question. The
usual argument is that someone just learning Haskell, when using map incorrectly, would much
rather see an error about lists than about Functors.
For (:), the (<|) function seems to be a replacement. I have no idea about [].

A nitpick, Data.Sequence isn't more efficient for "list operations", it is more efficient for sequence operations. That said, a lot of the functions in Data.List are really sequence operations. The finger tree inside Data.Sequence has to do quite a bit more work for a cons (<|) equivalent to list (:), and its memory representation is also somewhat larger than a list as it is made from two data types a FingerTree and a Deep.
The extra syntax for lists is fine, it hits the sweet spot at what lists are good at - cons (:) and pattern-matching from the left. Whether or not sequences should have extra syntax is further debate, but as you can get a very long way with lists, and lists are inherently simple, having good syntax is a must.
List isn't an ideal representation for Strings - the memory layout is inefficient as each Char is wrapped with a constructor. This is why ByteStrings were introduced. Although they are laid out as an array ByteStrings have to do a bit of administrative work - [Char] can still be competitive if you are using short strings. In GHC there are language extensions to give ByteStrings more String-like syntax.
The other major lazy functional Clean has always represented strings as byte arrays, but its type system made this more practical - I believe the ByteString library uses unsafePerfomIO under the hood.

With version 7.8, ghc supports overloading list literals, compare the manual. For example, given appropriate IsList instances, you can write
['0' .. '9'] :: Set Char
[1 .. 10] :: Vector Int
[("default",0), (k1,v1)] :: Map String Int
['a' .. 'z'] :: Text
(quoted from the documentation).

I am pretty sure this won't be an answer to your question, but still.
I wish Haskell had more liberal function names(mixfix!) a la Agda. Then, the syntax for list constructors (:,[]) wouldn't have been magic; allowing us to at least hide the list type and use the same tokens for our own types.
The amount of code change while migrating between list and custom sequence types would be minimal then.
About map, you are a bit luckier. You can always hide map, and set it equal to fmap yourself.
import Prelude hiding(map)
map :: (Functor f) => (a -> b) -> f a -> f b
map = fmap
Prelude is great, but it isn't the best part of Haskell.

How can I print polymorphic values in Standard ML?

Is there a way to print polymorphic values in Standard ML (SML/NJ specifically)? I have a polymorphic function that is not doing what I want and due to the abysmal state that is debugging in SML (see Any real world experience debugging a production functional program?), I would like to see what it is doing with some good-ol' print's. A simple example would be (at a prompt):
fun justThisOnce(x : 'a) : 'a = (print(x); x);
justThisOnce(42);
Other suggestions are appreciated. In the meantime I'll keep staring the offending code into submission.
Update
I was able to find the bug but the question still stands in the hopes of preventing future pain and suffering.

No, there is no way to print a polymorphic value.
You have two choices:
Specialize your function to integers or strings, which are readily printed. Then when the bug is slain, make it polymorphic again.
If the bug manifests only with some other instantiation, pass show as an additional argument to your function. So for example, if your polymorphic function has type
'a list -> 'a list
you extend the type to
('a -> string) -> 'a list -> 'a list
You use show internally to print, and then by partially applying the function to a suitable show, you can get a version you can use in the original context.
It's very tedious but it does help. (But be warned: it may drive you to try Haskell.)

Only in MOSML: Merely for debugging purposes, use the printVal function. Note that this function is only available in toplevel mode, it will cause an error when you try to compile your program.
Edit: In that case, I'm afraid there is no general solution, you need to translate your values explicitly to strings, and print those. See other answer for good suggestions.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex