I need a working tutorial on Isabelle and Sledgehammer - isabelle

I installed the latest Isabelle/jEdit package to my Windows computer. I used the official package from the Isabelle website. Then I opened tutorials which are on the Isabelle webpage and I started computer experiments. I realized immediately that many examples presented in these tutorials simply do not work in my installation!
For example, in this tutorial on the page 5 the following very basic example of Isabelle program is presented:
theory Test imports Main begin
lemma "[a]=[b] ⟹ a=b"
sledgehammer
Nothing of this worked in my installation!
Isabelle complained that the formula "[a]═[b] ⟹ a=b" has inner lexical error: "Failed to parse prop"
When I replaced this formula by working one "(A & B) ⟹ A" the command 'sledgehammer' did not work. I received the message in the goals field
goal (1 subgoal)
( A&B ⟹ A) &&& sledgehammer
instead of applying this strange sledgehammer.
An attempt to call this sledgehammer by the button of jEdit also failed, I received error message "Missing print function 'sledgehammer'"
So, my question to the community. Is there any WORKING tutorial of Isabelle with Sledgehammer, latest version? Or at least a collection of WORKING examples on the beginner level.

Both examples work without any problems on my installation of Isabelle 2015 (and I should be very surprised if it were any different on Isabelle 2014 or 2013).
The reason why Isabelle gives you a lexical error on [a]═[b] ⟹ a=b is that the first equals sign in that formula is not an equals sign, but a boy-drawing character. I have no idea how you managed to get that character in there, but if you replace it with an equals sign, it works.
In the second example, it appears that somehow, ‘sledgehammer’ was parsed as a term, not as a command. I have no idea how that can be (especially not without seeing the exact code you entered), but the following works fine on my installation:
theory Scratch
imports Main
begin
lemma "(A & B) ⟹ A"
sledgehammer
EDIT: Ah, I failed to see that you are using Isabelle for Windows. I have no experience with Isabelle for Windows, but I would guess that there is some problem with Sledgehammer and Isabelle on Windows in your case. If the example I printed before does not work on your installation, I suggest you write an email to the mailing list and report the problem, because it absolutely should work.
In any case, from what I have heard, Isabelle on Windows is somewhat painful and one should ideally use Linux or Mac OS.
On the other hand, Sledgehammer is by no means an essential component of Isabelle. You can easily use and learn Isabelle without Sledgehammer. As a tutorial, I can recommend the book Concrete Semantics. (free PDF version)

Update: Added a screenshot and a related link to the Isabelle User's List.
The purpose of the example in the Sledgehammer tutorial is to make sure your setup is working. The problem is that your setup is not working, not that the tutorial is not "WORKING".
I copied and pasted the following:
lemma "[a]=[b] ⟹ a=b"
sledgehammer
It works.
I've been using Sledgehammer extensively for over 3 years in Windows 7. It works now. It worked for me from the beginning. I've just started with Isabelle2015, but what I show below indicates everything is normal.
I pasted in this:
lemma "(A & B) ⟹ A" sledgehammer
It works. It returns
Try this: by simp (0.0 ms)
for z3, spass, cvc4, e, remote_vampire, and remote_e_sine.
It's a mystery to me why people say that Sledgehammer doesn't work on Windows. My guess is it's their anti-virus. I use Norton, and it shuts down the Isabelle installation at times, and even polyml, but never the Sledgehammer provers.
However, Sledgehammer starts external ATP provers: z3, spass, cvc4, and e, so it wouldn't be suprising that anti-virus shuts them down.
Don't Believe Me, Seeing is Believing
Here's a screenshot:
Martini's problem with Sledgehammer on Windows
A particular Brazilian with a Ph.D in computer science has had problems with Sledgehammer not running on Windows.
Here is a link to the 2015-May threads: https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2015-May/thread.html.
Search on "Alfio Martini" in your browser to see what he has to say in the different posts. He starts it off here: Re: [isabelle] Isabelle2015-RC3 available for testing.
(Note: That's not the perfect example of his reporting problems with Sledgehammer. In another post, possibly not in May, he talked about Sledgehammer not working at all.)
Posts about problems for a release candidate can be hard to follow because multiple problems get posted under the same title.
Basically, he says that Sledgehammer has never worked for him on Windows. That is the opposite of my experience. Sledgehammer has always worked for me within reason, that is, given the normal "bugs" of a modern piece of software.
In different posts to the list, he says several things, like that he's using Windows 8.1. It appears that he had several anti-virus programs running, one of them being an "old one lying around".
The end result is that it appears he got it running, but never explained in detail what he did to get it running.
He mentioned several problems with Sledgehammer that I'm sure have nothing to do with "Sledgehammer not working at all", such as problems related to cvc4.
If Sledgehammer never returns on a simple lemma, as he describes, there's a major problem that needs to be resolved with the setup.
I have never had a major problem with Sledgehammer on Windows 7. I have had Norton shut down poly.exe and not notify me about it, which caused me a lot of grief until I figure it out.

Related

what's the distinction between `shows` and `obtains` in Isabelle Isar?

I am trying to understand the difference between the shows and obtains commands in Isar (as of Isabelle 2020). The documentation in isar-ref.pdf (pp 137.) seems to have some typo and confuses me.
...
Moreover, there are two kinds of conclusions: shows states several
simultaneous propositions (essentially a big conjunction), while
obtains claims several simultaneous simultaneous contexts of
(essentially a big disjunction of eliminated parameters and
assumptions, cf. §6.6).
shows seems straight forward.
From the limited experience I have so far, it seems that obtains is about proving a conclusion that begins with an existential quantifier, as shown in this question (where the conclusion is existential and then the goal is a obtains).
Is this really the distinction between shows and obtains (universal vs existential)?
If not, what is the proper intended use of obtains?
The lemmas "shows ‹∃x. P x›" and "obtains x where ‹P x›` are very similar, but not entirely identical.
In terms of proofs, the obtain version requires to find an explicit witness (look the fact called that in such a proof). Something similar can be achieved by applying the theorem exI after the shows.
The generated lemmas are different. The obtains version generates an elimination rule instead of a quantified, because there is no existential quantifier in Pure. However, the difference rarely matters when using the theorem.

How to manage all the various proof methods

Is there a "generic" informal algorithm that users of Isabelle follow, when they are trying to prove something that isn't proved immediately by auto or sledgehammer? A kind of general way of figuring out, if auto needs additional lemmas, formulated by the user, to succeed or if better some other proof method is used.
A related question is: Is there maybe a table to be found somewhere with all the proof methods together with the context in which to apply them? When I'm reading through the Programming and Proving tutorial, the description of various methods (respectively variants of some methods, such as the many variant of auto) are scattered through the text, which constantly makes me go back and for between text and Isabelle code (which also leads to forgetting what exactly is used for what) and which results in a very inefficient workflow.
No, there's no "generic" informal way. You can use try0 which tries all standard proof methods (like auto, blast, fastforce, …) and/or sledgehammer which is more advanced.
After that, the fun part starts.
Can this theorem be shown with simpler helper lemmas? You can use the command "sorry" for assuming that a lemma is true.
How would I prove this on a piece of paper? And then try to do this proof in Isabelle.
Ask for help :) Lots of people on stack overflow, #isabelle on freenode and the Isabelle mailing list are waiting for your questions.
For your second question: No, there's no such overview. Maybe someone should write one, but as mentioned before you can simply use try0.
ammbauer's answer already covers lots of important stuff, but here are some more things that may help you:
When the automation gets stuck at a certain point, look at the available premises and the goal at that point. What kind of simplification did you expect the system to do at that point? Why didn't it do it? Perhaps the corresponding rule is just not in the simp set (add it with simp add:) or some preconditions of the rule could not be proved (in that case, add enough facts so that they can be proved, or do it yourself in an additional step)
Isar proofs are good. If you have some complicated goal, try breaking it down into smaller steps in Isar. If you have bigger auxiliary facts that may even be of more general interest, try pulling them out as auxiliary lemmas. Perhaps you can even generalise them a bit. Sometimes that even simplifies the proof.
In the same vein: Too much information can confuse both you and Isabelle. You can introduce local definitions in Isar with define x where "x = …" and unfold them with x_def. This makes your goals smaller and cleaner and decreases the probability of the automation going down useless paths in its proof search.
Isabelle does not automatically unfold definitions, so if you have a definition, and you want to unfold it for a proof, you have to do that yourself by using unfolding foo_def or simp add: foo_def.
The defining equations of functions defined with fun or primrec are unfolding by anything using the simplifier (simp, simp_all, force, auto) unless the equations (foo.simps) have manually been deleted from the simp set. (by lemmas [simp del] = foo.simps or declare [simp del] foo.simps)
Different proof methods are good at different things, and it takes some experience to know what method to use in what case. As a general rule, anything that requires only rewriting/simplification should be done with simp or simp_all. Anything related to classical reasoning (i.e. first-order logic or sets) calls for blast. If you need both rewriting and classical reasoning, try auto or force. Think of auto as a combination of simp and blast, and force is like an ‘all-or-nothing’ variant of auto that fails if it cannot solve the goal entirely. It also tries a little harder than auto.
Most proof methods can take options. You probably already know add: and del: for simp and simp_all, and the equivalent simp:/simp del: for auto. However, the classical reasoners (auto, blast, force, etc.) also accept intro:, dest:, elim: and the corresponding del: options. These are for declaring introduction, destruction, and elimination rules.
Some more information on the classical reasoner:
An introduction rule is a rule of the form P ⟹ Q ⟹ R that should be used whenever the goal has the form R, to replace it with P and Q
A destruction rule is a rule of the form P ⟹ Q ⟹ R that should be used whenever a fact of the form P is in the premises to replace to goal G with the new goals Q and R ⟹ G.
An elimination rule is something like thm exE (elimination of the existential quantifier). These are like a generalisation of destruction rules that also allow introducing new variables. These rules often appear in this like case distinctions.
The classical reasoner used by auto, blast, force etc. will use the rules in the claset (i.e. that have been declared intro/dest/elim) automatically whenever appropriate. If doing that does not lead to a proof, the automation will backtrack at some point and try other rules. You can disable backtracking for specific rules by using intro!: instead of intro: (and analogously for the others). Then the automation will apply that rule whenever possible without ever looking back.
The basic proof methods rule, drule, erule correspond to applying a single intro/dest/elim rule and are good for single step reasoning, e.g. in order to find out why automatic methods fail to make progress at a certain point. intro is like rule but applies the set of rules it is given iteratively until it is no longer possible.
safe and clarify are occasionally useful. The former essentially strips away quantifiers and logical connectives (try it on a goal like ∀x. P x ∧ Q x ⟶ R x) and the latter similarly tries to ‘clean up’ the goal. (I forgot what it does exactly, I just use it occasionally when I think it might be useful)

julia-client: can't render lazy

Could somebody please explain to me what this message might mean?
I have the Julia client running in Atom, and my code works properly and it gets me the results, but for some line executions(ctrl+enter) the instant eval gives me "julia-client: can't render lazy".
It appears that the behind the scenes the code is executed, but the inline evaluations prefers not to output anything.
The lines corresponding to these messages usually should return a 2 dimensional arrays or dataframes, and in Julia usually the type and the dimensions are printed in the eval, but for some specific lines it can't render.
I could not find similar reports anywhere else.
julia version 0.5.0-rc3
This is a problem with package versions being out of sync. It's you're on the Julia release (v0.5), this will be fixed with a Pkg.update(). In the future, this kind of question is better suited for the Juno discussion board

Isabelle: Sledgehammer finds a proof but it fails

Often times I have the problem that sledgehammer finds a proof, but when I insert it, it doesn't terminate. I guess sledgehammer is one of the most important parts of Isabelle, but then it gets annoying if a proof fails.
In the Sledgehammer tutorial,
there is a small chapter on "Why does Metis fail to reconstruct the proof?".
It lists:
Try the isar_proofs option to obtain a step-by-step Isar proof where
each step is justified by metis. Since the steps are fairly small,
metis is more likely to be able to replay them.
Try the smt proof method instead of metis. It is usually stronger,
but you need to either have Z3 available to replay the proofs, trust
the SMT solver, or use certificates.
Try the blast or auto proof methods, passing the necessary facts
via unfolding, using, intro:, elim:, dest:, or simp:, as
appropriate.
The problem is that the first option makes the proof more verbose and also it involves manual intervention.
The second option rarely works.
So what about the third option. Are there any easy to follow heuristics that I can apply?
What's the difference between unfolding and using? Also are there any best practices on how to use intro:, elim:, and dest: from a failed metis proof?
Partial EXAMPLE
proof-
have "(det (?lm)) = (det (transpose ?lm))" by (smt det_transpose)
then have "(det (?lm)) = [...][not shown]"
unfolding det_transpose transpose_mat_factor_col by auto
then show ?thesis [...][not shown]
qed
I would like to get rid of the first line of the proof, as the line seems trivial. If I remove the first line, sledgehammer will still find a proof, but this found proof fails (doesn't terminate).
Concerning your statement sledgehammer is one of the most important parts of Isabelle:
You never need sledgehammer to succeed with a proof. But of course sledgehammer is very convenient and can save a lot of tedious reasoning. Thus it is definitely a very important part for making Isabelle more usable for people who did not spend many years using it (and even for those sledgehammer makes everyday proving more productive).
Coming to your question
Try the blast or auto proof methods, passing the necessary facts via unfolding,
using, intro:, elim:, dest:, or simp:, as appropriate.
[...]
So what about [this] option. Are there any easy to follow heuristics that I can apply?
Indeed there are:
unfolding: This (recursively) unfolds equations, i.e., it is very similar to apply (simp only: ...). The heuristic is, when you do not get the expected result with simp: ... try unfolding ... instead (it might be the case that other equations are interfering).
using: This is used to add additional assumptions to the current subgoal. The heuristic is, whenever a fact does not fit one of the patterns below, try using instead.
intro:: This is used for introduction rules, i.e., of the form that whenever certain assumptions are satisfied some connective (or more generally constant) may be introduced.
Example: A ==> B ==> A & B (where the introduced constant is (&)).
elim:: This is used for elimination rules, i.e., of the form that from the presence of a certain connective (or more generally constant) some facts may be concluded as additional assumptions.
Example: A & B ==> (A ==> B ==> P) ==> P (where the constant (&) is eliminated in favor of explicitly having A and B as assumptions). Note the general form of the conclusion (which is not related to the major premise A & B), this is important to not loose provability (see also dest:).
dest:: This is used for destruction rules, i.e., of the form that from the presence of a certain constant some facts may be concluded directly.
Example: A & B ==> B (Note that the information that A holds is lost in the conclusion, unlike in the elim: example.)
simp:: This is used for simplification rules, i.e., (conditional) equations, which are always applied from left to right (thus it is sometimes useful to add [symmetric] to a fact, in order to apply it from right to left, but beware of nontermination, for it is easy to introduce looping derivations in this way).
Having said this, often it is just experience that lets you decide in which way best to employ a given fact inside a proof. What I usually do when I got a proof by sledgehammer which is too slow in Isar is to inspect the facts that where used by the found proof. Then categorize them as above, invoke auto appropriately and if that did not completely solve the goal, apply sledgehammer once more (hopefully delivering an "easier" proof this time).
You ask a number of questions, but I'll take your title and the second paragraph as the essence of your main complaint, where I end up giving a long-winded answer which can be summarized with,
Sledgehammer is part of a three-pronged arsenal,
you becoming more experienced, with never ending experimentation, along with trial and error is the heuristic,
not using many of the proofs which Sledgehammer returns is a big part of using Sledgehammer, and
the minimize and preplay_timeout options can save you some time and frustration by automatically playing the proofs back, which gives you timing information, and sometimes shows that a found proof will fail.
Starting with your second paragraph, you say:
Often times I have the problem that Sledgehammer finds a proof. But then I try it, but the proof doesn't terminate. I guess Sledgehammer is one of most important parts of Isabelle,...
Sledgehammer is important, but I consider it part of a three-pronged arsenal, where the three parts would be:
Detailed proof steps using natural deduction.
Automatic proof methods, such as auto, simp, rule, etc. A big part of this would be creating your own simp rewrite rules, and learning to use theorems with rule and the myriad of other automatic proof methods.
Sledgehammer calling automatic theorem provers (ATPs). Using steps 1 and 2, with experience, are used to set up Sledgehammer. Experience counts for a lot. You might use auto to simplify things so that Sledgehammer succeeds, but you might not use auto because it will expand formulas to where Sledgehammer has no chance of succeeding.
...but then it gets annoying if a proof fails.
So here, your expectations and my expectations for Sledgehammer diverge. These days, if I get annoyed, I get annoyed that I will have to work more than 30 seconds to prove a theorem. If I'm hugely disappointed that a particular Sledgehammer proof fails, it's because I've been trying to prove a theorem for hours or days without success.
Using Sledgehammer not to find proofs, but to find good proofs
Automation can sometimes alleviate frustration. Clicking on a Sledgehammer proof, only to find out that it fails, would be frustrating. Here is the way I currently use Sledgehammer, unless I start becoming desperate for a proof:
sledgehammer_params[minimize=smart,preplay_timeout=10,timeout=60,verbose=true,
max_relevant=smart,provers="
remote_vampire metis remote_satallax z3_tptp remote_e
remote_e_tofof spass remote_e_sine e z3 yices
"]
The options minimize=smart and preplay_timeout=10 are related to Sledgehammer playing back proofs, after it finds them. Not using many of the proofs that Sledgehammer finds is a big part of using Sledgehammer, and proof playback is a big part of culling out proofs.
Myself, I don't deal much with Sledgehammer proofs that don't terminate, but that's probably because I'm selective to begin with.
My first criteria for a Sledgehammer proof is that it be reasonably fast, and so when Sledgehammer reports that it's found a proof that's greater than 3 seconds long, I don't even try using it, unless I'm desperate to find out whether a theorem can be proved.
The use of Sledgehammer for me usually goes like this:
State a theorem and see if I get lucky with Sledgehammer.
If Sledgehammer gives me a proof that's 30 milliseconds or less, then I consider that good proof, but I still experiment with try and the automated proof methods of section 9.4.4, page 208, of isar-ref.pdf. Many times I can get a proof down to 5ms or less.
A metis proof of total time over 100ms, I'm willing to work 30 minutes or more to try and get a faster proof.
A metis proof of 200ms to 500ms, I'll resort to everything I know to try and get it down to below 100ms, which many times means converting to a detailed proof.
A smt or metis proof of greater than 1 second I only consider good as a temporary proof.
A proof in the output panel that Sledgehammer reports as being greater than 3 seconds, I usually don't even try, because even if it ends up working, I'm going to have to work to find another proof anyway, so I'd rather spend my time up front trying to find a good proof.
The option 3 heuristic
You say,
So what about the third option. Are there any easy to follow heuristics that I can apply?
The heuristic is:
"as appropriate",
which is to say that the heuristic is "use Sledgehammer as part of a three-pronged arsenal".
The heuristic is also "read lots of tutorials and documentation so that you have lots of other things to use with Sledgehammer". Sledgehammer is powerful, but it's not infinitely powerful, and for some theorems, you can use your own simp rules to prove in 0ms with apply(simp) or apply(auto) what Sledgehammer will never prove.
For myself, I'm up to about 150 to 200 theorems, so the "as appropriate" has much more meaning to me that it used to have. Basically, you try and set up Sledgehammer the way it needs to be set up.
The way Sledgehammer needs to be set up will sometimes mean running auto or simp first, but sometimes not, because many times running auto or simp will doom Sledgehammer to failure.
But sometimes, you don't even want a metis proof from Sledgehammer, except as a preliminary proof until you can find a better proof, which, for me, generally means a faster proof using the automatic proof methods.
I'm no authority on Sledgehammer, but it seems Sledgehammer is good at matching up hypotheses and conclusions from old theorems, with hypotheses and conclusions being used for a new theorem. What it's not good at is proving formulas which I've greatly expanded by using simp and auto.
I continue with the long-winded heuristic that is Sledgehammer centric:
Use Sledgehammer to jump-start the proof process, by proving some theorems with Sledgehammer that you otherwise don't know how to prove.
Turn your theorems which are equivalencies into simp rewrite rules for use with automatic proof methods like simp, auto, fastforce, etc., as described in chapter 9 of tutorial.pdf.
Use some of your theorems for conditional rewrite rules for use with intro and rule.
The last two steps are used to completely solve a proof step or used to set up Sledgehammer "as appropriate". Sledgehammer never ceases to be useful, no matter how much you know, and it's extremely useful when you don't know much, but Sledgehammer alone is not the road to success.
If Sledgehammer can't prove a theorem, then resort to a detailed proof, starting with a bare-bones detailed proof. Sometimes, breaking up an if-and-only-if into two conditionals allows Sledgehammer to easily prove the two conditionals, when it couldn't prove the if-and-only-if.
After you've proved lots of stuff, go back and optimize your proofs. Sometimes, with all the rewrite rules you've created, simp and auto will magically prove things, and you will get rid of some metis proofs that Sledgehammer found for you. Sometimes, you'll use Sledgehammer to find a metis proof that's even faster.
Use this command to optimize timing:
ML_command "Toplevel.timing := true"
There's another SO post giving more detail about it.
I can answer your subquestion "What's the difference between unfolding and using?". Roughly speaking, it works like this.
Suppose lemma foo is of the form x = a+b+c. If you write
unfolding foo
in your proof, then all occurrences of x will be replaced with a+b+c. On the other hand, if you write
using foo
then x=a+b+c will be added to your list of assumptions.

Why would Common Lisp (SBCL) use so much memory for a simple program?

since I'm a newbie to Common Lisp I tried to solve problems on SPOJ by using Common Lisp (SBCL). The first problem is a simple task of reading numbers until number 42 is found. Here's my solution:
(defun study-num ()
(let ((num (parse-integer (read-line t))))
(when (not (= num 42))
(format t "~A~%" num)
(study-num))))
(study-num)
The solution is accepted. But when I looked into the details of the result I found it used 57M of MEM! It's bloody unreasonable but I can't figure out why. What can I do to make an optimization?
You are making repeated recursive calls, without enough optimization switched on to enable tail-call elimination (SBCL does do this, but only when you have "optimize for speed" set high and "optimize for debug info" set low).
The Common Lisp standard leaves tail-call elimination as an implementation quality issue and provides other looping constructs (like LOOP or DO, both possibly suitable for this application).
In addition, a freshly started SBCL is probably going to be larger than you expect, due to needing to pull in its runtime environment and base image.
I think yo are not realizing that Common Lisp is an online language environment with full library and compiler loaded into RAM just to give you the first prompt. After that, load in your program is probably a hardly even noticeable increase in size. Lisp does not compile and link an independent executable file made of only your code and whatever lib routines reachable from your code. That's what C and similar languages do. Instead, Lisp adds your code into its already sizeable online environment. As a new user it seams horrible. But if you have a modern general purpose computer with 100's MB of RAM, it quickly becomes something you can forget about as you enjoy the benefits of the online environment. Thins is also called a "dynamic language environment."
Various Lisp implementations have different ways to create programs. One is to dump an image of the memory of a Lisp system and to write that to disk. On restart this image is loaded with a runtime and then started again. This is quite common.
This is also what SBCL does when it saves an executable. Thus this executable includes the full SBCL.
Some other implementations are creating smaller executables using images (CLISP), some can remove unused code from executables (Allegro CL, LispWorks) and others are creating very small programs via compilation to C (mocl).
SBCL has only one easy way to reduce the size of an executable: one can compress the image.

Resources