I would like to collect a list of some examples where non-constructive reasoning is either essential or makes something significantly more convenient. I am still digesting work of Russell O’Connor on “Classical Mathematics for a Constructive World” — Russell has many interesting examples there, including using classical reasoning for the constructive reals (for which basic relations like equality are undecidable). One is for verification of hybrid systems (that paper will be printing here shortly), which is certainly an eminently practical and computational application. The others are more mathematical in nature, as the title of the paper suggests. This leads me to refine my quest a little: I’d like to see examples of computational problems where non-constructive reasoning is either essential or very useful. Here are a few examples I know of so far. I’d be very grateful for any others you know of or think of that you’d be willing to share.

**Termination case.** In the Trellys project (particularly “Iowa Trellys”), we have made use of the ability to case split on whether or not a term terminates under the call-by-value operational semantics of the language. One great use for this feature is to provide simpler theorems about functions which have not been proven total (either because they are truly partial, or because we just decide not to invest our verification effort to do so). For example, in Iowa Trellys, quantifiers range over terms of the stated type, not values. To prove a theorem like

(assuming plus recurses on its first argument, so the theorem is not immediate) it is thus very handy to be able to case split on whether or not has a value. If it does not, then both sides of the equation diverge and are hence equal (since we equate all diverging terms). If does have a value, then we can do induction on it to prove the theorem in a standard way.

Similar examples arise even if the quantifiers of the language range only over values. Suppose we have defined a tail-recursive reverse-map function (our implementation’s notation is not this nice :-():

rev_map f [] l' = l'

rev_map f (x:l) l' = rev_map f l (f x : l')

Suppose we want to prove the following basic theorem about this function:

forall f, l1, l2, l' . rev_map f (l1 ++ l2) l' = rev_map f l2 (rev_map f l1 l')

The proof is by induction on l1. In the inductive case, where l1 is x:l1′ (say), we need to instantiate the induction hypothesis with f, l1′, l2, and f x: l’. But the latter is not known to terminate, and in general might not, since f x might very well diverge. Of course we could choose to prove the theorem only for total functions f, but that is strictly weaker than what we could hope to achieve: the theorem is also true if f is partial. For if f x diverges, then so will both sides of the above equation. With termination case, one can prove desirable theorems about higher-order functions in a general-recursive setting, where a purely constructive approach would only support strictly weaker theorems.

**Computational induction.** For quite some time in our work on Iowa Trellys, we have been vexed by the lack (in our current system) of a principle of induction on the length of a terminating computation. For example, suppose you have defined an interpreter for SK combinators (where I am writing embedded combinator terms like App(S,K) as just S K):

interp S = S

interp (S x) = S (interp x)

interp (S x y) = S (interp x) (interp y)

interp K = K

interp (K x) = K (interp x)

interp (x y) = app (interp x) (interp y)

app S x = S x

app (S x) y = S x y

app (S x y) z = interp (x z (y z))

app K x = x

app (K x) y = x

Suppose you wish to prove that it is idempotent:

A natural way to prove this is first to case split on whether or not interp x terminates. If it does not, then both sides of the above equation diverge under call-by-value semantics, and so we equate them. If interp x does terminate, then we would like to reason by induction on some measure of the terminating computation; here I will use nesting depth of recursive calls. For example, if we use the second equation above for interp, by the IH we know that (interp (interp x)) equals interp x, which shows that (interp (interp (S x)) equals interp (S x). And in the case for the third equation for app, by the IH we know that (interp (interp (x z (y z)))) equals interp (x z (y z), and hence (since interp (x z (y z)) is the value returned by interp in the situation that led to this call to app), we also know interp is idempotent on the original term.

Here is a way (that I just realized in the middle of the night yesterday) that we can use classical reasoning to derive a principle of computational induction for functions like interp above. First, we have to define a counter-indexed version of the function. This is in addition to the regular version of the function without counters. For interp, a counter-indexed version could be (I will just call both this and the original function “interp”):

interp n S = S

interp (n+1) (S x) = S (interp n x)

interp (n+1) (S x y) = S (interp n x) (interp n y)

interp n K = K

interp (n+1) (K x) = K (interp n x)

interp (n+1) (x y) = app n (interp n x) (interp n y)

interp 0 x = abort

app n S x = S x

app n (S x) y = S x y

app n (S x y) z = interp n (x z (y z))

app n K x = x

app n (K x) y = x

Now our goal is to prove , from our assumption . Here I am writing to mean that t terminates. Let us case split on whether or not . Such a case split is non-constructive, since if it were constructive we could effectively decide halting for SK combinators, which is impossible. In the case where that formula holds, we are obviously immediately done. So suppose it does not hold. Then (intuitionistically) we have . Let us imagine that we have a coinductive predicate for divergence, and suppose we know (presumably by an axiom) that . Let us now try to prove the following formula:

The proof is by coinduction: we must produce a proof of , where we use the co-IH only where it is guaranteed to be guarded by constructors for our coinductive type of divergence in the final return value of the co-inductive proof. How does this proof work? We first case split on x, to enable some partial evaluation of interp x. For example, suppose x is (S x’), and so interp x = S (interp x’). We can prove by contradiction: if there exists n where interp n x’ terminates, then interp (n+1) x terminates, contradicting our assumption . So we can use our co-IH to obtain a proof that interp x’ diverges, which we can then wrap in a constructor for the divergence type to show in a guarded way. The other cases proceed similarly (I believe: I did not work through them all for interp). Since we have derived from our assumption , and since contradicts (by the principle stated above) , which we are assuming, we have derived a contradiction and the proof is completed.

I’d be delighted to hear of other uses of classical reasoning for computational problems you know of!

## 14 Comments

Your examples above only appeal to Markov’s principle, as far as I can tell. There was obviously a reason the Russian constructivists adopted it. Do you know of an example (for computational problems) that requires something classical beyond Markov’s principle?

On the other hand, speaking of Russian recursive mathematics, do you know of an example of a practical computational problem where the formalized Church’s Thesis is helpful? (Which of course is non-classical.)

Hello, Ulrik. Thanks for this stimulating comment! But I am confused about something (not having studied Markov’s principle too much before now). If I’ve understood correctly, Markov’s principle can be formulated as saying that if one has a decidable predicate P, and if one knows that it is not the case that P(x) is false for every x, then one can conclude that there is some x where P(x) is true. Of course this is obvious classically. Constructively, the idea is that we could compute an x where P(x) is true (assuming we can enumerate the possible values of x) by just trying them all. Since we know P is not universally false, our program that is testing each possible value to see if it satisfies P(x) is guaranteed to halt. But it is not necessarily constructively guaranteed to halt. And that is where some reject the principle on constructivist grounds.

Here I think I am assuming something stronger: for what would the realizer be for the formula (sorry, not sure if Latex mode works in the comments) “forall x. (interp x) halts \/ (interp x) diverges” (in terms of my SK combinator example). It seems like it would have to be a function which given any value v for x, returns either (0,d) where d is a realizer of (interp v) halts; or else (1,d’) where d’ is a realizer of (interp v) diverges. But such a function is truly not computable. This seems different from the case with Markov’s principle, where there is a computable function, but the issue is simply that we might not be able to prove, in general or in a particular constructive logic, that this function is correct.

I would love to be enlightened if I have not correctly understood these issues or you otherwise disagree. Even if I am right, one could still be interested in examples that use more highly non-constructive principles, assuming there is some notion of degrees of constructivity (I have not looked if that has been studied).

Regarding the formalized Church-Turing thesis: if that takes the form of a proof every program written in one programming language has an equivalent program in another, I would think something like CompCert (a compiler which is proven in Coq to be semantics-preserving) would be an example. But again, I don’t know much about the formalized CT thesis either, so I would be grateful for correction if I’m missing something.

Cheers, and greetings from Iowa (where I believe the weather is, amazingly, isomorphic to Palo Alto’s today),

Aaron

(I hope the unicode in the following is readable.) I agree with your statement of Markov’s principle, and I completely agree that the formula you mentioned does not have a realizer, so if you do a case split on that particular disjunction, you’re going beyond recursive mathematics.

Here’s why it reminded me of Markov’s principle, though: To prove ∃ n. interp n x ↓ by case splitting on that formula just uses the implication (¬ ¬ ∃ n. interp n x ↓) → ∃ n. interp n x ↓, and this is an instance of Markov’ principle when definedness is a Σ₁ formula, as it is in recursive math (namely, there exists a computation trace verifing that the computation halts).

However, things may be different in your setting. What logic are you using to handle partiality and definedness?

By the way, your first example, the idempotence of interp, seems to be smoothly handled in the logic of partial terms where s ≅ t is defined to mean (s ↓ ∨ t ↓) → s = t. So there you’d be allowed to assume interp x ↓ when proving idempotence. (Oh, and you still have an instance of “involution” instead of “idempotent” in your post.).

Church’s Thesis is something like ∀ f : ℕ → ℕ. ∃ e : ℕ. ∀ n. f(n) ≅ {e}(n). That is, every constructive function is computed by a program. I guess it could come in handy when reasoning about higher-order programs, but I haven’t seen an example where it did, though.

Cheers and best wishes from San Francisco,

Ulrik

> assuming there is some notion of degrees of constructivity (I have not looked if that has been studied).

Depending on how you mean, indeed there are. The so-called “intermediate logics” are those which lie between intuitionistic logic and classical logic. Each of them is based on adding to IL some axiom(s) which allow proving more things, up to (but not quite including) everything CL does. Whether this lattice of logics coincides with “degrees of constructivity” is another matter.

One thing you’ll notice if you follow that wikipedia link is that intermediate logics are deeply tied up in various modal logics. The fact that we can embed classical truths into intuitionistic logic via double-negation transforms is itself a modal operator. Many of these modalities can be given computational significance, typically by using (co)monads as @FredericKoehler suggests. The double-negation transform is just a version of the continuation monad, IIRC. Thus, in a formal sense, classical reasoning can’t teach you anything new; though, in a colloquial sense, the change of perspective surely can and does teach us new things. Noone’d want to work with the raw continuation monad without any of the sugar we’ve built up for it, afterall.

Hi, Wren. Thanks for the reminder about intermediate logics. Do you know if anyone has looked at computational interpretations of such logics, as has been done for classical logic in the form of the lambda-mu calculus (and a number of related systems)?

A paper related to Ulrik’s earlier comment and this one, too, is Hugo Herbelin’s “An intuitionistic logic which proves Markov’s principle”:

http://pauillac.inria.fr/~herbelin/publis/lics-Her10-markov.pdf

Aaron

> assuming there is some notion of degrees of constructivity (I have not looked if that has been studied).

Ooh, I meant to comment on this as well. Besides the intermediate logics, we also have Constructive Reverse Mathematics, where the focus is on degrees of non-constructive mathematical principles, rather than non-constructive logical principles (assuming such a division can be made). This paper by Ishihara is a good place to start: http://philosophiascientiae.revues.org/406

Do you mean ‘idempotent’ when you say ‘involution’?

Dan,

Oops, you are right. I fixed that now. Thanks for the bug report.

Aaron

I wonder how much of this really requires classical reasoning: what about using the partiality monad? At least the proof that forall x. x + 0 = x for partial x is very easy (http://hpaste.org/78038). After all, LEM pretends to give us a witness for nontermination, but we never really need that — so perhaps it’s overkill.

Hi, Frederic. Thanks for that code snippet (in Agda, right?). That’s certainly an elegant way to do this, and I agree this seems to address some of what I was talking about, regarding quantifiers ranging over possibly diverging terms.

But what about my reverse-map example? Do you think you can do that one in a similar way? The rules of that game would be, I think, that quantifiers are ranging over total objects only, and we want to prove the theorem for any function f, including one which might diverge. Not being an Agda user myself, I’d be very curious to see how that one could be done in Agda.

In general, using the partiality monad makes me a bit nervous, since it seems like a somewhat drastic change of perspective. What if I don’t want to use coinductive types, for example? After all, they are not nearly as well understood, at least for practical use, as inductive types; for example, compositional approaches to co-termination seem to be just developing but not there yet. Now, one could say that adding classical reasoning is also a drastic change, but classical reasoning, in general, is not nearly as exotic (although computational interpretations are, so maybe it’s a tie).

Cheers,

Aaron

Hi Aaron,

Here is a proof for reverse-map using agda + the partiality monad: http://hpaste.org/78137

I warn you that I am no great agda programmer! But the argument is basically what you suggest. Full details about

dealing with f x are hidden behind this >>=-cong combinator, but it is hopefully not too surprising that it works:

both sides of the equality are (after normalization) of the form f x >>= (something), and so if f x diverges forever the proof of equality is also just an infinite stream of “later” terms. Once it converges, the proof is as expected.

PS I assumed you meant rev_map f (x:l) l’ = rev_map f l (f x : l’)

Hi again, Frederic. I thought more about your suggestion about the partiality monad, and I have a few more concerns about it.

1. Here is a common idiom I encountered in both Guru and Trellys programming (research languages developed in my group and with others). Write a function in the general-recursive fragment of the language, where termination need not be statically confirmed. Then later, prove externally that the function is total. Such a statement of totality can be expressed by saying something like “for all (total) inputs to the function, there exists a (total) value which equals the result of the function applied to the inputs”. Once termination is proved, the function can be used in the total fragment of the language. I am not sure how easy this workflow would be to carry out with the partiality monad.

2. In some cases, like the nice plus example you included, it is quite clear and natural to write the function in the partiality monad. For other cases, I am less confident. For example (as another challenge!), can you write my SK interpreter in the partiality monad? Now, I was reviewing Venanzio Capretta’s “General Recursion Via Coinductive Types” (which I have blogged about before), and he has a construction for a fixed point combinator that can take a function of type “(A -> partial B) -> (A -> partial B)”, and produce a least (in a suitable sense) fixed point of type A -> partial B. That is amazing, and in one sense completely addresses the issue I am raising about expressing functions in the partiality monad.

But! Capretta’s construction works by parallel search through all the finite approximations of this fixed point, looking for the first to converge on a given input. This is theoretically adequate, and sounds remarkably like Markov’s theorem we were discussing earlier. But from a practical perspective this is going to be woefully inefficient.

So for these two reasons, I (and others) would like something more direct. In the end, I expect the partiality monad or something like it would be sufficient to implement these abstractions for general recursion; but there is likely to be a much preferable direct implementation in terms of a general-recursion operator.

Thanks again for the discussion (and I hope you take up one of my challenge problems),

Aaron

Hi, Frederic. I dub you Champion of the partiality monad and lord of all you survey! 🙂 Thanks for sharing your solution to the reverse-map problem. In fact, the monad really is doing a lot of work here, in the sense you described: case-splitting on convergence or divergence of f x is completely hidden, which does seem like a significant win for this approach. I can see it is time for me to start playing with Agda — these examples come out quite elegantly there.

Thanks again for making me think more about this,

Aaron

Hi, Aaron. This is a nice blog post, and with lots of interesting conversation.

I think most, if not all of classical reasoning could be done with monads. In fact, it might be the case that all that control offers can be done using monads. In fact there is even a monadic translation of Curien and Herbelin’s -calculus, http://repositorium.sdum.uminho.pt/bitstream/1822/20889/1/main.pdf.

That being said, a primitive notion of control has some advantages over monads. When using monads one can quickly finds them in a state of “monad hell” where one is stuck within their chosen group of monads. I also think that monads quickly complicate programs. It might also be the case that monads make it harder to verify the correctness of programs, but I am not convinced of this yet. Does anyone have any idea if this is true?

I am not a anti-monad advocate, but as Aaron knows I am slightly biased towards control.

Great stuff.

Harley