using [[simp_trace]] let's you see what simplification rules were used and using [[unify_trace_failure]] let's you see what unification issues were encountered. I wonder if resolution proofs can be equally traced. This would make Isabelle proofs effectively surveyable.
I don't think one can trace auto, blast,... step by step or get proof scripts from them. (this mailing list thread should be interesting for you). What you can do is get a list of the used facts, for more details see davidgs answer in this thread.
Related
When constructing a proof in Isabelle/HOL we are actually constructing a lambda expression that has a type corresponding to the theory we are trying to proof.
Is there anyway to see the raw lambda expression that corresponds to a proved theorem?
I get the feeling you're coming from the world of dependently-typed systems like Coq or Lean. Isabelle is an LCF-style prover, which works quite differently. No information on the proof steps is recorded for performance reasons – the soundness of the system is instead ensured by having a comparatively small and simple kernel that all other code must go through in order to produce theorems.
There is, however, an option to let the Isabelle kernel record ‘proof terms’, which are probably more or less what you are looking for. Look at the HOL-Proofs session in the Isabelle distribution and the following paper:
Proof terms for simply typed higher order logic
(freely accessible version, slides)
However, this is a feature that is almost never used and the suffers from poor performance of anything except very small examples.
There are several reasons for this and I am not an expert, so take this with a grain of salt: my impression is that the reason is that 1. this feature has never been considered very important so far and is therefore not fully optimised, and 2. proofs in Isabelle tend to use lots of automation, and the proof terms resulting from such automatic procedures are often needlessly blown up and ugly.
Another issue might be (careful, I might be completely mistaken here) that systems like Coq and Lean have the concept of definitional equality and apply such equations implicitly without recording their application in the proof term at all. Isabelle/HOL, on the other hand, has no such thing (all equalities are the same) and one must therefore be recorded explicitly.
However, there has recently been some new interest in this matter and people are actively working on improving the performance and usability of Isabelle's proof terms. So hopefully the situation will be a bit better in a few years!
I need to do a presentation on a paper which at some point makes use of Isabelle/Isar and Isabelle/HOL.
I tried researching online about Isabelle/HOL and Isabelle/Isar to be able to eplain the relations in one or two slides.
Here are the relations as I currently understand them:
Isabelle - provides a generic infrastructure for deductive systems
Based on Standard ML programming language
provides an IDE which allows you to write theories which can be later be proved.
Isabelle/Pure - minimal version of higher-order logic according to this link:
Is it an actual language that can be inputted into isabelle IDE?
Or is it a technical specification?
Isabelle/HOL(Higher Order Logic):
Is it a library or a language?
How does it relate to Isabelle/Pure?
Is it procedural in nature?
Do tactics only exist in Isabelle/HOL?
Is it LCF - Logical Commutable Functions?
Isabelle/Isar:
Structured proof language based on Isabelle/Pure
Declarative
Is it an extension of Isabelle/HOL as stated at here?
Do locales only exist in Isabelle/Isar?
What does the Isabelle/IDE supports by default?
Just feels like I'm getting conflicting information from different sources and would like to sort this out.
Thanks in advance
Edit - Check out this highly related question and Manuel Eberls answer here: What are all the isabelle/slashes?
As this is an answer to a homework question and I myself only have limited understanding of all parts of the Isabelle project, this answer merely tries to point you in the right direction for at some parts of your question.
From the Isabelle/ISAR reference manual:
The Isabelle system essentially provides a generic infrastructure for building deductive systems (programmed in Standard ML), with a special focus on interactive theorem proving in higher-order logics.
It continues to also introduce ISAR:
In contrast Isar provides an interpreted language environment of its own,
which has been specifically tailored for the needs of theory and proof development.
[...]
The main concern of Isar is the design of a human-readable structured proof
language
Let's try to connect Pure to all of this by looking at publications from Makarius Wenzel regarding the topic:
Thus Isar proof texts may be understood as structured compositions of formal entities of the Pure framework, namely propositions, facts, and goals
In colloquial terms, Pure is the semantic foundation. Isar is a language that "follows" this semantic and provides syntax for it. Isabelle is just (one of the) platforms it all runs on.
Some of your confusions around the distinction between Pure and Isar seem to stem from the fact that the Isabelle Pure source code defines, or at least seems to define, both the semantics (Pure) and the syntax (Isar) in one go:
(* The Pure theory, with definitions of Isar commands and some lemmas. *)
In my humble opinion, this might be related to your understanding of syntax, semantics and "implementations" of the two. "Pure" outside of computers or paper is just semantics and thus, like math, just a thing in our brains. Give it syntax and you can put it to paper or type it into a machine. For the machine to be able to process your text (since this is ultimately what we after), it needs an implementation. Some framework telling it how to read the syntax and how to then process it. This framework is Isabelle. On top of Isabelle, there is Isabelle/Pure, which defines the semantics (the processing) and Isabelle/Isar, which defines syntax. For practical reasons, Isabelle's Pure implementation already provides the Isar syntax in one go.
From all of this, you might be able to figure HOL out yourself!
Some more references:
The Isabelle/Isar Implementation
I see there is an option in the Alloy Analyzer to allow recursion up to a certain depth (1-3).
But what happens when a counterexample cannot be found because of the limited recursion depth?
Will there be an error or a warning, or are such counterexamples silently ignored?
Alloy basically does not support recursion. When it encounters recursion, it unrolls the code the max number of times. Therefore, if it cannot found a solution, it just, well, cannot find a solution. If it could only generate an error if it knew there was a potential solution, which would solve the original problem.
This is, imho, one of the weakest spots in Alloy. Recursion is extremely important in almost all specifications.
When applying the wrong tactic or the wrong deduction rule, the error message is usually too general:
Failed to apply initial proof method⌂
I am using Isabelle to teach natural deduction. When Isabelle complains, some students change the rule/tactic arbitrary without reflecting on the possible causes of the error. A more detailed error message could be part of the learning process of Isabelle, I think.
How to make those error messages student friendly? Does that require editing the source code or can it be managed by defining more expressive tactics of natural deduction?
Tactics in Isabelle can be thought of as chainable non-deterministic transformations of the goal state. That means that the question of what specifically caused a tactic to fail is difficult to answer in general, and there is no mechanism to track such information in Isabelle's tactic system. However, one could relatively easily modify existing tactics such that they can optionally output some tracing information.
However, I have no idea what this information should be. There are simple tactics such as rule where the reason why applying it fails is always that the rule that it is given cannot be unified with the goal (and possibly chained facts), and there are similarly simple tactics like intro, drule, frule, erule, and elim. Such unification-related problems can be debugged quite well sometimes using declare [[unify_trace_failure]], which prints some tracing information every time a unification fails.
With simp and auto, the situation is much less clear because of how many different things these methods can do. Essentially, when the proof method could not be applied at all, it means that ‘none of the things that simp and auto can do worked for this goal’. For simp, this includes simplification, splitting, linear arithmetic, and probably a lot more things that I forgot. For auto, it additionally includes classical reasoning with a certain search depth. One cannot really say easily what specific thing went wrong when these methods fail.
Some specialised tactics do print more specific error messages if something goes wrong, e.g. sat and smt sometimes print a special error message when they have found a counterexample to the goal, but I cannot even imagine what more helpful output for something like simp or auto would look like. If you have an idea, please do tell me.
I think this problem cannot really be solved with error messages; one must simply get to know the system and the tactics one uses better and understand what they do and when they fail. Perhaps it would be good to have a kind of catalogue of commonly-used tactics that mentions these things.
If Isabelle did not find a proof for a lemma, is it possible to output everything that was done by all the proof methods that were employed in order to arrive at the subgoals, at which they couldn't proceed any further ? This would help me see, at which avenues they got stuck, which then would help me to point them in the right direction.
(And also for completed proofs I would find it interesting to have a complete proof log that shows all the elementary inferences that were performed to proof some lemma.)
This question sounds similar to this one, which I answered a few days ago. Parts of that answer also apply here. Anyway, to answer this question specifically:
Not really. For most basic proof methods (rule et al., intro, fact, cases, induct) it is relatively straightforward what they do and when they fail, it is pretty much always because the rule they tried to apply does not unify with the goals/premises that they are given. (or they don't know which rule to apply in the first place)
You were probably thinking more of more automatic tactics like blast, force, simp, and auto. Most of them (blast, force, fastforce, fast, metis, meson, best, etc.) are ‘all-or-nothing’: They either solve the subgoal or they do nothing at all. It is therefore a bit tricky to find out where they get stuck and usually people use auto for this kind of exploration: You apply auto, look at the remaining subgoals, and think about what facts/parameters you could add in order to break down those more.
The situation with simp is similar, except that it does less than auto. simp is the simplifier, which uses term rewriting, custom rewriting procedures called simprocs, certain solvers (e.g. for linear arithmetic), and a few other convenient things like splitters to get rid of if expressions. auto is basically simp combined with classical reasoning, which makes it a lot more powerful than simp, but also less predictable. (and occasionally, auto does too much and thereby turns a provable goal into an unprovable goal)
There are some tracing tools (e.g. the simplifier trace, which is explained here). I thought there also was a way to trace classical reasoning, but I cannot seem to find it anymore; perhaps I was mistaken. In any case, tracing tools can sometimes help to explain unexpected behaviour, but I don't think they are the kind of thing you want to use here. The better approach is to understand what kinds of things these methods try, and then when simp or auto returns a subgoal, you can look at those and determine what you would have expected simp and auto to do next and why it didn't do that (usually because of some missing fact) and fix it.