Coq verification of factorial program through two implementations - functional-programming

I am newbie to coq and I am trying to verify the functionality of factorial program.
From my understanding, what I should do is to follow the standard Hoare Logic paradigm, start from the precondition, figure the loop invariant, and reason the postcondition. Something like this:
{{ X = m }}
{{ FOL 1 }}
Y ::= 1;;
{{ FOL 2 }}
WHILE !(X = 0) DO
{{ FOL 3 }}
Y ::= Y * X;;
{{ FOL 4 }}
X ::= X - 1
{{ FOL 5 }}
END
{{ FOL 6 }}
{{ Y = m! }}
Here the FOL standards from "first order logic".
However, to my surprise, it seems that when verifying the factorial program with coq, the common way is to define the following two functions fact and fact_tr:
Fixpoint fact (n:nat) :=
match n with
| 0 => 1
| S k => n * (fact k)
end.
Fixpoint fact_tr_acc (n:nat) (acc:nat) :=
match n with
| 0 => acc
| S k => fact_tr_acc k (n * acc)
end.
Definition fact_tr (n:nat) :=
fact_tr_acc n 1.
and future proof the equivalence of these two functions:
Theorem fact_tr_correct : forall n:nat,
fact_tr n = fact n.
I learned such approach from here and here.
So here is my question:
Can someone illustrate the motivation behind such "equality-based" verification approach? Are they still conceptually similar to the standard Hoare Logic based reasoning?
Still, can I use coq to verify the correctness of factorial program following the "standard" Hoare logic based approach? Say by specifying the precondition, postcondition and inductively reasoning the whole program.

Notice that the underlying language of Coq's programs belongs to the family of (dependently-typed) functional languages, not imperative ones. Roughly, there is no state and statements, only expressions.
The motivation behind "equality-based" approach is that simple functional programs can serve as specifications. And fact is certainly simple -- it is Coq speak for the definition of factorial via its fundamental recurrence relation. In other words, fact is a reference implementation, i.e. in this case it is an obviously correct implementation. While fact_tr_acc is an optimized one, whose correctness with respect to the specification we wish to establish.
Yes, you can still verify the correctness of the imperative factorial program. E.g. the Software Foundations series shows how to encode imperative programs in Coq and verify their correctness using Hoare logic. See, in particular, the factorial exercise.

Related

How do you lookup the definition or implementation of Coq proof tactics?

I am looking at this:
Theorem eq_add_1 : forall n m,
n + m == 1 -> n == 1 /\ m == 0 \/ n == 0 /\ m == 1.
Proof.
intros n m. rewrite one_succ. intro H.
assert (H1 : exists p, n + m == S p) by now exists 0.
apply eq_add_succ in H1. destruct H1 as [[n' H1] | [m' H1]].
left. rewrite H1 in H; rewrite add_succ_l in H; apply succ_inj in H.
apply eq_add_0 in H. destruct H as [H2 H3]; rewrite H2 in H1; now split.
right. rewrite H1 in H; rewrite add_succ_r in H; apply succ_inj in H.
apply eq_add_0 in H. destruct H as [H2 H3]; rewrite H3 in H1; now split.
Qed.
How do I find what the thing like intros or destruct mean exactly, like looking up an implementation (or if not possible, a definition)? What is the way to do this efficiently?
The answer differs for primitive and user-defined tactics. However, the proof script you show uses almost no user-defined tactics, except now, which is a notation for the easy tactic.
If you're not sure if a tactic is primitive, try both; checking the manual might be the simplest first step.
For user-defined tactics.
For tactics defined as Ltac foo args := body. you can use Print Ltac foo (e.g. Print Ltac easy.). AFAIK, that does not work for tactics defined by Tactic Notation. In both cases, I prefer to look at the sources (which I find via grep).
For primitive tactics
There is the Coq reference manual (https://coq.inria.fr/distrib/current/refman/coq-tacindex.html), which does not have complete specification but is usually the closest approximation. It’s not very accessible, so you should first refer to one of the many Coq tutorials or introductions, like Software Foundations.
There is the actual Coq implementation, but that’s not very helpful unless you’re a Coq implementer.
As Blaisorblade mentioned, it can be difficult to understand exactly what tactics are doing, and it is easier to look at the reference manual to find out how to use them. However, at a conceptual level, tactics are not that complicated. Via the Curry-Howard correspondence, Coq proofs are represented using the same functional language you use to write regular programs. Tactics like intros or destruct are just metaprograms that write programs in this language.
For instance, suppose that you need to prove A -> B. This means that you need to write a program with a function type A -> B. When you write intros H., Coq builds a partial proof fun (H : A) => ?, where the ? denotes a hole that should have type B. Similarly, destruct adds a match expression to your proof to do case analysis, and asks you to produce proofs for each match branch. As you add more tactics, you keep filling in these holes until you have a complete proof.
The language of Coq is quite minimal, so there is only so much that tactics can do to build a proof: function application and abstraction, constructors, pattern matching, etc. In principle, it would be possible to have only a handful of tactics, one for each syntactic construct in Coq, and this would allow you to build any proof! The reason we don't do this is that using these core constructs directly is too low level, and tactics use automated proof search, unification and other features to simplify the process of writing a proof.

Decidable equality in Agda with less than n^2 cases?

I've got a datatype for the AST of a programming langauge that I'd like to reason about, but there are about 10 different constructors for the AST.
data Term : Set where
UnitTerm : Term
VarTerm : Var -> Term
...
SeqTerm : Term -> Term -> Term
I'm trying to write a function that has decidable equality for syntax trees of this language. In theory this is straightforward: there's nothing too complicated, it's just simple data being stored in the AST.
The problem is that writing such a function seems to require about 100 cases: for each constructor, there's 10 cases.
eqDecide : (x : Term) -> (y : Term) -> Dec (x ≡ y)
eqDecide UnitTerm UnitTerm = yes refl
eqDecide UnitTerm (VarTerm x) = Generic.no (λ ())
...
eqDecide UnitTerm (SeqTerm t1 t2) = Generic.no (λ ())
EqDecide (VarTerm x) UnitTerm = Generic.no (λ ())
...
The problem is, there are a bunch of redundant cases. After the first pattern match where the constructors match, ideally I could match with underscore, since no possible other constructor could unify, but it doesn't appear that I can do that.
I've tried and failed to use this library to derive the equality: I'm running into problems with strict positivity, as well as getting some general errors that I'm are pretty hard to debug. The Agda Prelude also has some facility for this, but looks pretty immature, and is lacking some things that I need from the standard library.
How do people do decidable equality in practice? Do they suck it up and just write all 100 cases, or is there a trick I'm missing? Is this just a place where the newness of the language is showing through?
If you want to avoid using reflection and still prove decidable equality in a linear number of cases, you could try the following approach:
Define a function embed : Term → Nat (or to some other type for which decidable equality is easier to prove such as labelled trees).
Prove that your function is indeed injective.
Make use of the fact that your function is injective together with decidable equality on the result type to conclude decidable equality on Terms (see for example via-injection in the module Relation.Nullary.Decidable in the standard library).

Idris vectors vs linked lists

Does Idris do any kind of optimization under the hood of vectors? Because from the looks of it, an Idris vector is just a linked list with known size (known at compile time). In fact, in general it seems like you could express the following equivalence (I'm guessing a bit at the syntax):
Vector : Nat -> Type -> Type
Vector n t = (l: List t ** length l = n)
So while this is nice in the sense of preventing range errors, the real advantage of vectors (in the traditional usage of the term) is in terms of performance; in particular, O(1) random access. It seems that the idris vector would not support this (how would you write the indexing function to have this performance?).
Assuming that there's no wizardry going on under the hood (as happens with Nat) to reconfigure Vectors, is there a random-access data type in Idris?
How would be/is such a thing defined in an algebraic type system? Certainly it seems like it would be impossible to define it inductively.
Is it possible, within a type system like that of Idris, to create a data type which supports O(1) random access, and is aware of its length such that all access is provably valid? (Haskell has array-style Vectors, but their concrete implementation is opaque to the average user, including me)
It doesn't do anything to optimise Vector lookups (at the time of writing this answer, at least).
This isn't because of any difficulty in doing it, really, but more because I would rather have some kind of general framework for writing this kind of optimisation than hard coding lots of them. Admittedly, we already have hard coded optimisations for Nat, but I still would prefer not to add loads more in an ad-hoc fashion.
Depending on what you actually want it for, it might be that the experimental uniqueness type system will help, in that you could have a low level mutable thing under the hood and still have safe and efficient access and update in a pure style in the high level language. We'll see...
Edwin has the definitive answers on what Idris currently does. However, if you are looking for something which might be natural to optimize into constant-time lookup in some cases, the following might be a step in the right direction.
For compile-time fixed-size vectors (i.e., not under a lambda, not parameterized by length at top-level), the following structure gives you vectors and lookup functions that, for any fixed concrete length, can be compile-time normalized to functions that should be somewhat straightforwardly optimizable into constant-time functions. (Sorry, code is in Coq; I don't have a working version of Idris at the moment, and don't know it well. I'm happy to replace this with Idris code if someone suggests the right syntax, e.g., in a comment.)
Fixpoint vector (n : nat) (A : Type) :=
match n return Type with
| 0 => unit
| S n' => (A * vector n' A)%type
end.
Definition nil {A} : vector 0 A := tt.
Definition cons {n} {A : Prop} (x : A) (xs : vector n A) : vector (S n) A
:= (x, xs).
Fixpoint get {n} {A : Prop} (m : nat) (default : A) (v : vector n A) {struct n} : A
:= match n as n return vector n A -> A with
| 0 => fun _ => default
| S n' => match m with
| 0 => fun v => fst v
| S m' => fun v => #get n' A m' default (snd v)
end
end v.
The idea is that, for any fixed n, the normal form of get is non-recursive, so the compiler could, hypothetically, compile it into a function whose runtime is independent of what n happens to be.

Define natural numbers in functional languages like Ada subtypes

In Ada to define natural numbers you can write this:
subtype Natural is Integer range 0 .. Integer'Last;
This is type-safe and it is checked at compile-time. It is simple (one-line of code) and efficient (it does not use recursion to define natural numbers as many functional languages do). Is there any functional language that can provide similar functionality?
Thanks
This is type-safe and it is checked at compile-time.
As you already pointed out in the comments to your question, it is not checked at compile time. Neither is equivalent functionality in Modula-2 or any other production-ready, general-purpose programming language.
The ability to check constraints like this at compile time is something that requires dependent types, refinement types or similar constructs. You can find those kinds of features in theorem provers like Coq or Agda or in experimental/academic languages like ATS or Liquid Haskell.
Now of those languages I mentioned Coq and Agda define their Nat types recursively, so that's not what you want, and ATS is an imperative language. So that leaves Liquid Haskell (plus languages that I didn't mention, of course). Liquid Haskell is Haskell with extra type annotations, so it's definitely a functional language. In Liquid Haskell you can define a MyNat (a type named Nat is already defined in the standard library) type like this:
{-# type MyNat = {n:Integer | n >= 0} #-}
And then use it like this:
{-# fac :: MyNat -> MyNat #-}
fac :: Integer -> Integer
fac 0 = 1
fac n = n * fac (n-1)
If you then try to call fac with a negative number as the argument, you'll get a compilation error. You will also get a compilation error if you call it with user input as the argument unless you specifically check that the input was non-negative. You would also get a compilation error if you removed the fac 0 = 1 line because now n on the next line could be 0, making n-1 negative when you call fac (n-1), so the compiler would reject that.
It should be noted that even with state-of-the-art type inference techniques non-trivial programs in languages like this will end up having quite complicated type signatures and you'll spend a lot of time and effort chasing type errors through an increasingly complex jungle of type signatures having only incomprehensible type errors to guide you. So there's a price for the safety that features like these offer you. It should also be pointed out that, in a Turing complete language, you will occasionally have to write runtime checks for cases that you know can't happen as the compiler can't prove everything even when you think it should.
Typed Racket, a typed dialect of Racket, has a rich set of numeric subtypes and it knows about a fair number of closure properties (eg, the sum of two nonnegative numbers is nonnegative, the sum of two exact integers is an exact integer, etc). Here's a simple example:
#lang typed/racket
(: f : (Nonnegative-Integer Nonnegative-Integer -> Positive-Integer))
(define (f x y)
(+ x y 1))
Type checking is done statically, but of course the typechecker is not able to prove every true fact about numeric subtypes. For example, the following function in fact only returns values of type Nonnegative-Integer, but the type rules for subtraction only allow TR to conclude the result type of Integer.
> (lambda: ([x : Nonnegative-Integer] [y : Nonnegative-Integer])
(- x (- x y)))
- : (Nonnegative-Integer Nonnegative-Integer -> Integer)
#<procedure>
The Typed Racket approach to numbers is described in Typing the Numeric Tower by St-Amour et al (appeared at PADL 2012). There's usually a link to the paper here, but the link seems to be broken at the moment. Google has a cached rendering of the PDF as HTML, if you search for the title.

In pure functional languages, is data (strings, ints, floats.. ) also just functions?

I was thinking about pure Object Oriented Languages like Ruby, where everything, including numbers, int, floats, and strings are themselves objects. Is this the same thing with pure functional languages? For example, in Haskell, are Numbers and Strings also functions?
I know Haskell is based on lambda calculus which represents everything, including data and operations, as functions. It would seem logical to me that a "purely functional language" would model everything as a function, as well as keep with the definition that a function most always returns the same output with the same inputs and has no state.
It's okay to think about that theoretically, but...
Just like in Ruby not everything is an object (argument lists, for instance, are not objects), not everything in Haskell is a function.
For more reference, check out this neat post: http://conal.net/blog/posts/everything-is-a-function-in-haskell
#wrhall gives a good answer. However you are somewhat correct that in the pure lambda calculus it is consistent for everything to be a function, and the language is Turing-complete (capable of expressing any pure computation that Haskell, etc. is).
That gives you some very strange things, since the only thing you can do to anything is to apply it to something else. When do you ever get to observe something? You have some value f and want to know something about it, your only choice is to apply it some value x to get f x, which is another function and the only choice is to apply it to another value y, to get f x y and so on.
Often I interpret the pure lambda calculus as talking about transformations on things that are not functions, but only capable of expressing functions itself. That is, I can make a function (with a bit of Haskelly syntax sugar for recursion & let):
purePlus = \zero succ natCase ->
let plus = \m n -> natCase m n (\m' -> plus m' n)
in plus (succ (succ zero)) (succ (succ zero))
Here I have expressed the computation 2+2 without needing to know that there are such things as non-functions. I simply took what I needed as arguments to the function I was defining, and the values of those arguments could be church encodings or they could be "real" numbers (whatever that means) -- my definition does not care.
And you could think the same thing of Haskell. There is no particular reason to think that there are things which are not functions, nor is there a particular reason to think that everything is a function. But Haskell's type system at least prevents you from applying an argument to a number (anybody thinking about fromInteger right now needs to hold their tongue! :-). In the above interpretation, it is because numbers are not necessarily modeled as functions, so you can't necessarily apply arguments to them.
In case it isn't clear by now, this whole answer has been somewhat of a technical/philosophical digression, and the easy answer to your question is "no, not everything is a function in functional languages". Functions are the things you can apply arguments to, that's all.
The "pure" in "pure functional" refers to the "freedom from side effects" kind of purity. It has little relation to the meaning of "pure" being used when people talk about a "pure object-oriented language", which simply means that the language manipulates purely (only) in objects.
The reason is that pure-as-in-only is a reasonable distinction to use to classify object-oriented languages, because there are languages like Java and C++, which clearly have values that don't have all that much in common with objects, and there are also languages like Python and Ruby, for which it can be argued that every value is an object1
Whereas for functional languages, there are no practical languages which are "pure functional" in the sense that every value the language can manipulate is a function. It's certainly possible to program in such a language. The most basic versions of the lambda calculus don't have any notion of things that are not functions, but you can still do arbitrary computation with them by coming up with ways of representing the things you want to compute on as functions.2
But while the simplicity and minimalism of the lambda calculus tends to be great for proving things about programming, actually writing substantial programs in such a "raw" programming language is awkward. The function representation of basic things like numbers also tends to be very inefficient to implement on actual physical machines.
But there is a very important distinction between languages that encourage a functional style but allow untracked side effects anywhere, and ones that actually enforce that your functions are "pure" functions (similar to mathematical functions). Object-oriented programming is very strongly wed to the use of impure computations3, so there are no practical object-oriented programming languages that are pure in this sense.
So the "pure" in "pure functional language" means something very different from the "pure" in "pure object-oriented language".4 In each case the "pure vs not pure" distinction is one that is completely uninteresting applied to the other kind of language, so there's no very strong motive to standardise the use of the term.
1 There are corner cases to pick at in all "pure object-oriented" languages that I know of, but that's not really very interesting. It's clear that the object metaphor goes much further in languages in which 1 is an instance of some class, and that class can be sub-classed, than it does in languages in which 1 is something else than an object.
2 All computation is about representation anyway. Computers don't know anything about numbers or anything else. They just have bit-patterns that we use to represent numbers, and operations on bit-patterns that happen to correspond to operations on numbers (because we designed them so that they would).
3 This isn't fundamental either. You could design a "pure" object-oriented language that was pure in this sense. I tend to write most of my OO code to be pure anyway.
4 If this seems obtuse, you might reflect that the terms "functional", "object", and "language" have vastly different meanings in other contexts also.
A very different angle on this question: all sorts of data in Haskell can be represented as functions, using a technique called Church encodings. This is a form of inversion of control: instead of passing data to functions that consume it, you hide the data inside a set of closures, and to consume it you pass in callbacks describing what to do with this data.
Any program that uses lists, for example, can be translated into a program that uses functions instead of lists:
-- | A list corresponds to a function of this type:
type ChurchList a r = (a -> r -> r) --^ how to handle a cons cell
-> r --^ how to handle the empty list
-> r --^ result of processing the list
listToCPS :: [a] -> ChurchList a r
listToCPS xs = \f z -> foldr f z xs
That function is taking a concrete list as its starting point, but that's not necessary. You can build up ChurchList functions out of just pure functions:
-- | The empty 'ChurchList'.
nil :: ChurchList a r
nil = \f z -> z
-- | Add an element at the front of a 'ChurchList'.
cons :: a -> ChurchList a r -> ChurchList a r
cons x xs = \f z -> f z (xs f z)
foldChurchList :: (a -> r -> r) -> r -> ChurchList a r -> r
foldChurchList f z xs = xs f z
mapChurchList :: (a -> b) -> ChurchList a r -> ChurchList b r
mapChurchList f = foldChurchList step nil
where step x = cons (f x)
filterChurchList :: (a -> Bool) -> ChurchList a r -> ChurchList a r
filterChurchList pred = foldChurchList step nil
where step x xs = if pred x then cons x xs else xs
That last function uses Bool, but of course we can replace Bool with functions as well:
-- | A Bool can be represented as a function that chooses between two
-- given alternatives.
type ChurchBool r = r -> r -> r
true, false :: ChurchBool r
true a _ = a
false _ b = b
filterChurchList' :: (a -> ChurchBool r) -> ChurchList a r -> ChurchList a r
filterChurchList' pred = foldChurchList step nil
where step x xs = pred x (cons x xs) xs
This sort of transformation can be done for basically any type, so in theory, you could get rid of all "value" types in Haskell, and keep only the () type, the (->) and IO type constructors, return and >>= for IO, and a suitable set of IO primitives. This would obviously be hella impractical—and it would perform worse (try writing tailChurchList :: ChurchList a r -> ChurchList a r for a taste).
Is getChar :: IO Char a function or not? Haskell Report doesn't provide us with a definition. But it states that getChar is a function (see here). (Well, at least we can say that it is a function.)
So I think the answer is YES.
I don't think there can be correct definition of "function" except "everything is a function". (What is "correct definition"? Good question...) Consider the next example:
{-# LANGUAGE NoMonomorphismRestriction #-}
import Control.Applicative
f :: Applicative f => f Int
f = pure 1
g1 :: Maybe Int
g1 = f
g2 :: Int -> Int
g2 = f
Is f a function or datatype? It depends.

Resources