Most QA9 readers will know the nonconstructive proof that there are two irrational numbers a and b such that a to the power b is rational.  There is interesting discussion about this here.  I recently came across a 2014 note by Roger Hindley about the history of how this example came to be more broadly known.  It is interesting reading.  Hindley credits a 1953 paper by Dov Jarden in Scripta Mathematica (volume 19, page 229) with the proof.  In case you do not have access to this journal in a local library, I have decided to include a scanned copy of the article below:

It’s a Math tweet from 1953.

In our Computational Logic seminar here at The University of Iowa, we are studying logic programming this semester.  We are using the very nice book “Logic, Programming, and Prolog”, freely available online.  We were talking today about the existence of a least Herbrand model for a definite program.  A definite program is just a set of clauses of the form $A_0 \leftarrow A_1,\ldots,A_m$, where each $A_i$ is an atomic formula (predicate applied to terms).  (Free variables in clauses are interpreted universally.)  If $m = 0$, then we just have an atomic fact $A_0$ in the definite program.  A Herbrand interpretation is a first-order structure where each function symbol $f$ of arity $k$ is interpreted as $\lambda x_1,\ldots,x_k. f(x_1,\ldots,x_k)$, and each predicate is interpreted as a subset of the set of ground (i.e., variable-free) atomic formulas.  A Herbrand model of a definite program P is then just a Herbrand interpretation which satisfies every clause in P.  It will be convenient below to identify a Herbrand interpretation with a subset of the set of all ground atomic formulas.  Such a subset determines the meanings of the predicate symbols by showing for which tuples of ground terms they hold.  We will pass tacitly between the view of a Herbrand interpretation as a first-order structure and the view of it as a set of ground atomic formulas.  The Herbrand base is the Herbrand interpretation corresponding to the set of all ground atomic formulas.  It says that everything is true.

What I want to talk about briefly in this post is the fact that the set of Herbrand models  of definite program P forms a complete partial order, where the ordering is the subset relation, the greatest element is the Herbrand base, and the greatest lower bound of a non-empty subset S of Herbrand models of P is the intersection of all the models in S.  In a complete partial order, every subset S of elements should have a greatest lower bound (though it need not lie in S).  Alternatively — and what I am interested in for this post — we can stipulate that every subset S should have a least upper bound.  The two formulations are equivalent, and the proof is written out below.  “Logic, Programming, and Prolog” contains a simple elegant proof of the fact that the intersection of a non-empty set of Herbrand models is itself a Herbrand model.

What I want to record here is the proof that in general, if in a partial order $(X,\sqsubseteq)$ every subset $S\subseteq X$ (including the empty set) has a greatest lower bound, then every such $S$ also has a least upper bound.  The proof I have seen for this is a one-liner in Crole’s “Categories for Types”.  It took me some puzzling to understand, so I am writing it here as much for my own memory as for the possible interest of others, including others from the seminar who watched me fumble with the proof today!

Let $S$ be a subset of $X$.  Let $\textit{ub}(S)$ be the set of elements which are upper bounds of $S$ (that is, the set of elements $u$ which are greater than or equal to every element of $S$).  The claim is that the greatest lower bound of $\textit{ub}(S)$ is the least upper bound of $S$.  By the assumption that every subset of $X$ has a greatest lower bound, we know that there really is some element $q$ which is the greatest lower bound of $\textit{ub}(S)$.  As such, $q$ is greater than or equal to every other lower bound of $\textit{ub}(S)$.  Now here is a funny thing.  Every element $x$ of $S$ is a lower bound of $\textit{ub}(S)$.  Because if $y\in \textit{ub}(S)$, this means that $y$ is greater than or equal to every element in $S$.  In particular, it is greater than or equal to $x$.  Since this is true for every $y\in \textit{ub}(S)$, we see that $x$ is a lower bound of $\textit{ub}(S)$.  But $q$ is the greatest of all such lower bounds by construction, so it is greater than or equal to the lower bound $x$.   And since this is true for all $x\in S$, we see that $q$ is an upper bound of all those elements, and hence an upper bound of $S$.  We just have to prove now that it is the least of all the upper bounds of $S$.  Suppose $u'$ is another upper bound of $S$.  This means $u'\in\textit{ub}(S)$.  Since by construction $q$ is a lower bound of $\textit{ub}(S)$, this means that $q \sqsubseteq u'$, as required to show that $q$ is the least of all the upper bounds of $S$.

The final interesting thing to note about the complete partial order of Herbrand models of a definite program P is that while the greatest lower bound of a non-empty set $S$ of models is their intersection, and while the greatest element is the Herbrand base (a universal Herbrand model), the intuitive duals of these operations are not the least element nor the least upper bound operation.  The intuitive dual of a universal Herbrand model would be, presumably, the empty Herbrand interpretation.  But this need not be a model at all.  For example, the definite program P could contain an atomic fact like $p(a)$, and then the empty Herbrand interpretation would not sastisfy that fact.  Furthermore, if $S$ is a non-empty set of Herbrand models, $\bigcup S$ is not the least upper bound of $S$.  That is because $\bigcup S$ need not be a Herbrand model of P at all.  Here is a simple example.  Suppose P is the definite program consisting of clauses $\textit{ok}(h(a,b))$ and $\textit{ok}(h(x,y)) \leftarrow \textit{ok}(x),\textit{ok}(y)$.  Consider the following two Herbrand models $H_1$ and $H_2$ of this program P. In $H_1$ the interpretation of $\textit{ok}$ contains all the terms built using $h$ from $a$ and $h(a,b)$.  In $H_2$, the interpretation of $\textit{ok}$ contains all the terms built using $h$ from $b$ and $h(a,b)$.  If we take the intersection of $H_1$ and $H_2$, then it is a Herbrand model, in fact the minimal one: it says that $\textit{ok}(h(a,b))$ is true, as required by the first clause in P; and if two terms $t_1$ and $t_2$ are in the interpretation of $\textit{ok}$, then so is $h(t_1,t_2)$.  But if we take the union of $H_1$ and $H_2$, what we get is not a Herbrand model of P at all.  Because $H_1 \cup H_2$ contains $\textit{ok}(h(a,a))$ and $\textit{ok}(h(b,b))$, for example, but not $\textit{ok}(h(h(a,a),h(b,b)))$.  To get an upper bound of $H_1$ and $H_2$, it is not enough to take their union.  One must take their union and then close them under the deductive consequences of the program P.  That’s the intuition, though we would need to formally define closure under deductive consequences — and it would be a bit nicer to be able to apply a model-theoretic notion (since we are working model-theoretically here) rather than a proof-theoretic one.   Declaratively, we know we can get the least upper bound of a set $S$ of Herbrand models as the intersection of the set of all Herbrand models which are supersets of every model in $S$.  But this is rather a hard definition to work with.

Anyhow, this is a nice example of finding an interesting abstract structure in semantics, as well as a good exercise in reasoning about such structures.

I haven’t yet started repeating myself — though there’s every chance you’ll hear it here twice — but iteration is the sort of thing one can find just one use after another for. I mean, if you’ve seen it once, you’ve seen it a thousand times: iteration delivers repeatedly. How many times have you iterated to good effect? I say again: is iteration great or what?

Ok, got that out of my system. :-) I am working on lambda encodings right now, and with Church-encoded data, every piece of data is its own iterator. So the encoding tends to make one think of algorithms in terms of iteration. We have a function f, and a starting point a, and we wish to apply f to a in a nested fashion, n times: $f^0(a) = a$ and $f^{n+1}(a) = f(f^n(a))$. To multiply numbers N and M, for example, we can iterator the function “add M” on starting point 0, N times. And other natural algorithms have iterative formulations.

What about division? Usually in total type theories (where it is required that uniform termination of every function — that is, termination on all inputs — has to be confirmed statically by some termination-checking algorithm or technique), natural-number division is implemented using well-founded recursion. The idea is that to divide x by y, we basically want to see how many times we can subtract y from x until x becomes smaller than y (at which point it is the remainder of the division). So one wants to make a recursive call to division on x – y, and since that quantity is not the predecessor of x (or y), the usual structural decrease demanded by the termination checker is not satisfied. So the usual simple schemes for observing termination statically cannot confirm that division is terminating. And indeed, if y were 0, there would be no structural decrease. So it is not a completely trivial matter. The solution one finds in Coq (Arith/Euclid.v in the standard library for Coq version 8.4) and Agda (Data/Nat/DivMod.agda in the standard library version 0.8) is to use well-founded recursion. This is a somewhat advanced method that uses a generalized inductive type to encode, effectively, all the legal terminating call sequences one could make using a given well-founded ordering. Then we can do structural recursion on an extra argument of this generalized inductive type.

Well-founded recursion is really quite cool, and it’s amazing to see the power of the type theory in the fact that well-founded recursion is derivable, not primitive, to the language. Every student of type theory should try walking through the definitions needed for well-founded recursion over, say, that natural number ordering <. But as elegant and impressive as it is, it's a pretty heavy hammer to have to get out. For starters, if you want to reason later about the function you defined by well-founded recursion, you are most likely going to have to use well-founded induction in that reasoning. So you find yourself continually setting up these somewhat complicated inductions to prove simple lemmas. A second issue is that at least in Agda, because there is no term erasure explicit in the language, if you write a function by well-founded recursion, you are going to be manipulating these values of the generalized inductive datatype at runtime. I reported earlier on this blog that in my experience this led to a major, major slowdown for running code extracted from Agda. So if you are just doing some formal development to prove a theorem, then well-founded recursion won't cause you serious problems in Agda. But if you want to extract and run code that uses well-founded recursion, you likely will see major performance issues.

In my standard library for Agda, the version of natural-number division defined by well-founded recursion is in nat-division.agda:

 {- a div-result for dividend x and divisor d consists of the quotient q, remainder r, and a proof that q * d + r = x -} div-result : ℕ → ℕ → Set div-result x d = Σ ℕ (λ q → Σ ℕ (λ r → q * d + r ≡ x))

 div-helper : ∀ (x : ℕ) → WfStructBool _<_ x → (y : ℕ) → y =ℕ 0 ≡ ff → div-result x y div-helper x wfx 0 () div-helper x (WfStep fx) (suc y) _ with 𝔹-dec (x =ℕ 0) ... | inj₁ u = 0 , 0 , sym (=ℕ-to-≡ u) ... | inj₂ u with 𝔹-dec (x < (suc y)) ... | inj₁ v = 0 , (x , refl) ... | inj₂ v with (div-helper (x ∸ (suc y)) (fx (∸< {x} u)) (suc y) refl) ... | q , r , p with <ff {x} v ... | p' with ∸eq-swap{x}{suc y}{q * (suc y) + r} p' p ... | p'' = (suc q) , (r , lem p'') where lem : q * (suc y) + r + suc y ≡ x → suc (y + q * suc y + r) ≡ x lem p''' rewrite +suc (q * (suc y) + r) y | +comm y (q * (suc y)) | +perm2 (q * (suc y)) r y = p''' 

_÷_!_ : (x : ℕ) → (y : ℕ) → y =ℕ 0 ≡ ff → div-result x y x ÷ y ! p = div-helper x (wf-< x) y p 

This code returns a value of type div-result x y, which contains the quotient q, remainder r, and the proof that x equals y * q + r. It is not as simple as one would like, due to the use of well-founded recursion.

But we can avoid well-founded recursion for defining division, if we go back to our old friend iteration (“There he is again!” — sorry, I said I had that out of my system, but apparently not quite). Because we know that we will not possibly iterate subtraction of y from x more than x times, if y is not 0. So we can pass an extra argument in to division which is a counter, that we start out at x. Again we use the div-result type, but this time there is no need for well-founded recursion:

 divh : (n : ℕ) → (x : ℕ) → (y : ℕ) → x ≤ n ≡ tt → y =ℕ 0 ≡ ff → div-result x y divh 0 0 y p1 p2 = 0 , 0 , refl divh 0 (suc x) y () p2 divh (suc n) x y p1 p2 with keep (x < y) divh (suc n) x y p1 p2 | tt , pl = 0 , x , refl divh (suc n) x y p1 p2 | ff , pl with divh n (x ∸ y) y (∸≤2 n x y p1 p2) p2 divh (suc n) x y p1 p2 | ff , pl | q , r , p = suc q , r , lem where lem : y + q * y + r ≡ x lem rewrite sym (+assoc y (q * y) r) | p | +comm y (x ∸ y) = ∸+2{x}{y} (<ff{x}{y} pl)

 

_÷_!_ : (x : ℕ) → (y : ℕ) → y =ℕ 0 ≡ ff → div-result x y x ÷ y ! p = divh x x y (≤-refl x) p 

You can find this in nat-division2.agda. The code is also a bit less cluttered with helper lemmas, although we still do need to require that x is less than or equal to n, in order to rule out the case that we run out of counter budget (n) before we are done dividing x.

This example shows that sometimes iteration is sufficient for defining functions like division whose natural definition is not structurally recursive. The moral of the story is that we should not forget about iteration. And that is a lesson worth repeating!

Well, I am embarrassed at how late I am in posting the solution to the puzzle I mentioned in my last post.  It has been a busy summer with taking care of our sweet new baby at home, and running StarExec development at work.  Anyhow, below is a graph with the minimum number of nodes which contains every legal possible combination of properties from termination (aka strong normalization), normalization (aka weak normalization), confluence (aka Church-Rosser), and local confluence (aka Weak Church-Rosser), and their negations.  This graph was found by Hans Zantema, whom I asked about this puzzle by email (he agreed to let me share his solution here).  Furthermore, he argues that 11 is the minimal number of nodes, as follows.  Out of the 16 possible combinations of properties SN, WN, CR, and WCR and their negations, we exclude immediately the combinations with SN and ~WN (since SN implies WN) and CR and ~WCR (since CR implies WCR).  So there are three legal possibilities for the values of CR and WCR, and three for the values of SN and WN.  These are independent, so there are 9 legal combinations of properties.  Now, Hans argues, since there is a node X which is SN and ~WCR, there must be two nodes which are SN and CR.  For since X is SN but not WCR, it has two children (which are still SN) which cannot be joined.  We may assume these children are themselves CR, otherwise we could repeat this observation and the graph would not be minimal.  Similarly, since there is a node which is ~WN and ~WCR, there must be two nodes which are ~WN and CR.  So there must be at least 11 nodes.  And the graph below has 11 nodes.  To test your knowledge, you can try to identify which combination of properties each node has!  Fun!

Suppose we have a graph (A,->) consisting of a set of objects A and a binary relation -> on A.  This is a simple case of an abstract reduction system, as defined in the Terese book (in the more general case, we have not just one relation ->, but an indexed set of relations).  In the theory of abstract reduction systems, an element x is confluent iff whenever there is a path from x to y and a path from x to z, then there exists some element q which is reachable from both y and z.  An element x is locally confluent iff whenever there is an edge (not an arbitrary path) from x to y and an edge from x to z, then there is some element q reachable from both y and z.  So confluence implies local confluence, but (rather famously) the reverse implication holds only for terminating systems.  An element is terminating iff there are no infinite paths from that element.  An element is normalizing iff there exists a path from that element to a normal form, which is an element that has no outgoing edges.  So terminating implies normalizing.

We have these four properties: confluence, local confluence, termination (sometimes also called strong normalization), and normalization (sometimes called weak normalization).  What is the smallest graph that is property diverse, in the sense that for every consistent combination of properties, the graph contains an element with that combination of properties?  (The consistency requirement for the set of properties for an element arises because confluence implies local confluence and termination implies normalization).

I will post the answer to this (with a nice picture) Monday…