Error message in Isabelle/HOL - isabelle

When applying the wrong tactic or the wrong deduction rule, the error message is usually too general:
Failed to apply initial proof method⌂
I am using Isabelle to teach natural deduction. When Isabelle complains, some students change the rule/tactic arbitrary without reflecting on the possible causes of the error. A more detailed error message could be part of the learning process of Isabelle, I think.
How to make those error messages student friendly? Does that require editing the source code or can it be managed by defining more expressive tactics of natural deduction?

Tactics in Isabelle can be thought of as chainable non-deterministic transformations of the goal state. That means that the question of what specifically caused a tactic to fail is difficult to answer in general, and there is no mechanism to track such information in Isabelle's tactic system. However, one could relatively easily modify existing tactics such that they can optionally output some tracing information.
However, I have no idea what this information should be. There are simple tactics such as rule where the reason why applying it fails is always that the rule that it is given cannot be unified with the goal (and possibly chained facts), and there are similarly simple tactics like intro, drule, frule, erule, and elim. Such unification-related problems can be debugged quite well sometimes using declare [[unify_trace_failure]], which prints some tracing information every time a unification fails.
With simp and auto, the situation is much less clear because of how many different things these methods can do. Essentially, when the proof method could not be applied at all, it means that ‘none of the things that simp and auto can do worked for this goal’. For simp, this includes simplification, splitting, linear arithmetic, and probably a lot more things that I forgot. For auto, it additionally includes classical reasoning with a certain search depth. One cannot really say easily what specific thing went wrong when these methods fail.
Some specialised tactics do print more specific error messages if something goes wrong, e.g. sat and smt sometimes print a special error message when they have found a counterexample to the goal, but I cannot even imagine what more helpful output for something like simp or auto would look like. If you have an idea, please do tell me.
I think this problem cannot really be solved with error messages; one must simply get to know the system and the tactics one uses better and understand what they do and when they fail. Perhaps it would be good to have a kind of catalogue of commonly-used tactics that mentions these things.

Related

Sledgehammer within Locale

I am a happy user of Isabelle/Isar and Sledgehammer, but am just now trying to also use locales, as in my use case there are just overwhelming arguments for it.
I am using the Isabelle/December 2021 distribution, but most of the time when I am trying to use sledgehammer within a locale context, I will get a message like this:
"cvc4": Prover error:
exception TERM raised (line 457 of "~~/src/HOL/Tools/SMT/smt_translate.ML"): bad SMT term
It is the same message for other provers as well. Is this something that is a well-known problem? Without using locales I had such a problem only come up when my theory name was confused with some HOL theory name, and renaming my theory was a workaround. Is there something similar at play here? Is there an easy fix? Because I use sledgehammer a lot, so not being able to use it within a locale would be a severe blow against using locales.
For the sake of completeness, a summary of the discussion on the Isabelle mailing-list.
There is an issue in the translation from Isabelle to SMT Lib (the language for SMT solvers): lets inside lets are not translated correctly. This should be fixed in the next Isabelle release.
The work-around in the mean time is to avoid let inside let.

How can I recover the Pure lambda expression associated with a proof in Isabelle?

When constructing a proof in Isabelle/HOL we are actually constructing a lambda expression that has a type corresponding to the theory we are trying to proof.
Is there anyway to see the raw lambda expression that corresponds to a proved theorem?
I get the feeling you're coming from the world of dependently-typed systems like Coq or Lean. Isabelle is an LCF-style prover, which works quite differently. No information on the proof steps is recorded for performance reasons – the soundness of the system is instead ensured by having a comparatively small and simple kernel that all other code must go through in order to produce theorems.
There is, however, an option to let the Isabelle kernel record ‘proof terms’, which are probably more or less what you are looking for. Look at the HOL-Proofs session in the Isabelle distribution and the following paper:
Proof terms for simply typed higher order logic
(freely accessible version, slides)
However, this is a feature that is almost never used and the suffers from poor performance of anything except very small examples.
There are several reasons for this and I am not an expert, so take this with a grain of salt: my impression is that the reason is that 1. this feature has never been considered very important so far and is therefore not fully optimised, and 2. proofs in Isabelle tend to use lots of automation, and the proof terms resulting from such automatic procedures are often needlessly blown up and ugly.
Another issue might be (careful, I might be completely mistaken here) that systems like Coq and Lean have the concept of definitional equality and apply such equations implicitly without recording their application in the proof term at all. Isabelle/HOL, on the other hand, has no such thing (all equalities are the same) and one must therefore be recorded explicitly.
However, there has recently been some new interest in this matter and people are actively working on improving the performance and usability of Isabelle's proof terms. So hopefully the situation will be a bit better in a few years!

Complete proof output in Isabelle

If Isabelle did not find a proof for a lemma, is it possible to output everything that was done by all the proof methods that were employed in order to arrive at the subgoals, at which they couldn't proceed any further ? This would help me see, at which avenues they got stuck, which then would help me to point them in the right direction.
(And also for completed proofs I would find it interesting to have a complete proof log that shows all the elementary inferences that were performed to proof some lemma.)
This question sounds similar to this one, which I answered a few days ago. Parts of that answer also apply here. Anyway, to answer this question specifically:
Not really. For most basic proof methods (rule et al., intro, fact, cases, induct) it is relatively straightforward what they do and when they fail, it is pretty much always because the rule they tried to apply does not unify with the goals/premises that they are given. (or they don't know which rule to apply in the first place)
You were probably thinking more of more automatic tactics like blast, force, simp, and auto. Most of them (blast, force, fastforce, fast, metis, meson, best, etc.) are ‘all-or-nothing’: They either solve the subgoal or they do nothing at all. It is therefore a bit tricky to find out where they get stuck and usually people use auto for this kind of exploration: You apply auto, look at the remaining subgoals, and think about what facts/parameters you could add in order to break down those more.
The situation with simp is similar, except that it does less than auto. simp is the simplifier, which uses term rewriting, custom rewriting procedures called simprocs, certain solvers (e.g. for linear arithmetic), and a few other convenient things like splitters to get rid of if expressions. auto is basically simp combined with classical reasoning, which makes it a lot more powerful than simp, but also less predictable. (and occasionally, auto does too much and thereby turns a provable goal into an unprovable goal)
There are some tracing tools (e.g. the simplifier trace, which is explained here). I thought there also was a way to trace classical reasoning, but I cannot seem to find it anymore; perhaps I was mistaken. In any case, tracing tools can sometimes help to explain unexpected behaviour, but I don't think they are the kind of thing you want to use here. The better approach is to understand what kinds of things these methods try, and then when simp or auto returns a subgoal, you can look at those and determine what you would have expected simp and auto to do next and why it didn't do that (usually because of some missing fact) and fix it.

Do monads do anything other than increase readability and productivity?

I have been looking at monads a lot over the past few months (functors and applicative functors as well). I have been attempting to figure out when monads are useful in a general sense. If I am looking at a piece of code I ask, should I employ a specific monad or a stack via transformers? In my efforts I think I have found an answer but I want others input in case I have missed something. It appears to me that monads are useful for abstracting away specific plumbing to increase readabilty/the declaritive nature of a piece of code which can have a side affect of increasing productivity by requiring less code to write. The only exception I can find is the IO monad which attempts to keep a pure function pure in the face of IO. It doesn't appear that a given monad provides a solution to a problem that can't be acheived via other means. Am I missing something?
Does any feature beyond mere Turing-completeness provide a solution to a problem that can't be achieved via other means? Nope. All Turing-equivalent languages are different ways of expressing the same basic things. Since monads are built out of more fundamental building blocks, obviously that set of building blocks is able to do anything a monad can. When we talk about a language or feature "allowing" us to do something, we mean it allows us to express that thing naturally, in a way that's easy to understand.

Should I use an expression parser in my Math game?

I'm writing some children's Math Education software for a class.
I'm going to try and present problems to students of varying skill level with randomly generated math problems of different types in fun ways.
One of the frustrations of using computer based math software is its rigidity. If anyone has taken an online Math class, you'll know all about the frustration of taking an online quiz and having your correct answer thrown out because your problem isn't exactly formatted in their form or some weird spacing issue.
So, originally I thought, "I know! I'll use an expression parser on the answer box so I'll be able to evaluate anything they enter and even if it isn't in the same form I'll be able to check if it is the same answer." So I fire up my IDE and start implementing the Shunting Yard Algorithm.
This would solve the problem of it not taking fractions in the smallest form and other issues.
However, It then hit me that a tricky student would simply be able to enter most of the problems into the answer box and my expression parser would dutifully parse and evaluate it to the correct answer!
So, should I not be using an expression parser in this instance? Do I really have to generate a single form of the answer and do a string comparison?
One possible solution is to note how many steps your expression evaluator takes to evaluate the problem's original expression, and to compare this to the optimal answer. If there's too much difference, then the problem hasn't been reduced enough and you can suggest that the student keep going.
Don't be surprised if students come up with better answers than your own definition of "optimal", though! I was a TA/grader for several classes, and the brightest students routinely had answers on their problem sets that were superior to the ones provided by the professor.
For simple problems where you're looking for an exact answer, then removing whitespace and doing a string compare is reasonable.
For more advanced problems, you might do the Shunting Yard Algorithm (or similar) but perhaps parametrize it so you could turn on/off reductions to guard against the tricky student. You'll notice that "simple" answers can still use the parser, but you would disable all reductions.
For example, on a division question, you'd disable the "/" reduction.
This is a great question.
If you are writing an expression system and an evaluation/transformation/equivalence engine (isn't there one available somewhere? I am almost 100% sure that there is an open source one somewhere), then it's more of an education/algebra problem: is the student's answer algebraically closer to the original expression or to the expected expression.
I'm not sure how to answer that, but just an idea (not necessarily practical): perhaps your evaluation engine can count transformation steps to equivalence. If the answer takes less steps to the expected than it did to the original, it might be ok. If it's too close to the original, it's not.
You could use an expression parser, but apply restrictions on the complexity of the expressions permitted in the answer.
For example, if the goal is to reduce (4/5)*(1/2) and you want to allow either (2/5) or (4/10), then you could restrict the set of allowable answers to expressions whose trees take the form (x/y) and which also evaluate to the correct number. Perhaps you would also allow "0.4", i.e. expressions of the form (x) which evaluate to the correct number.
This is exactly what you would (implicitly) be doing if you graded the problem manually -- you would be looking for an answer that is correct but which also falls into an acceptable class.
The usual way of doing this in mathematics assessment software is to allow the question setter to specify expressions/strings that are not allowed in a correct answer.
If you happen to be interested in existing software, there's the open-source Stack http://www.stack.bham.ac.uk/ (or various commercial options such as MapleTA). I suspect most of the problems that you'll come across have also been encountered by Stack so even if you don't want to use it, it might be educational to look at how it approaches things.

Resources