In our seminar for the Computational Logic Group at Iowa, we have been studying Logic Programming for the past couple semesters.  We are continuing this semester, and have been reading “The Stable Model Semantics for Logic Programming” by Gelfond and Lifschitz (available on Citeseer here), which is a classic paper describing a semantics for logic programs with negation.  I was interested to read this paper, as I am interested in datatypes in type theory and programming languages with negative occurrences (in the types of the constructors) of the type being defined.  I am hoping that maybe some ideas from LP could possibly carry over to semantics for such datatypes.  This is intellectually a bit of a longshot, but the paper is quite nice reading for anyone interested in computational logic.

Anyway, I want to record here a few notes from the discussion this past Friday, of Alain Mebsout, Nestan Tsiskaridze, Ruoyu Zhang, Baoluo Meng, and myself. First, some background from the paper.  A clause is an implication where there is a conjunction, possibly empty, of literals (either atomic formulas or negated atomic formulas) in the antecedent of the implication, and a single atomic formula as the consequent.  The authors work just with ground (i.e., variable-free) clauses, since a set of nonground clauses can be modeled as the (possibly infinite) set of all its ground instances.  A program Pi is a set of clauses.  The authors define an operation which I will denote here Pi/M, where M is a set of atomic formulas.  The operation drops “not A” from the formula for every atom A which is not in M.  And it drops every clause that contains “not A” (in the antecedent) where A is in M.  The intuition is that we can pretend M is exactly the set of atoms we know to be true, and then simplify the clauses of Pi by looking at negated literals.  If you negate an atom which is in M, then you are negating something which we are pretending is true, and hence the negation is definitely false.  A clause containing such a literal is satisfied if we view M as determining a model (namely, the term model where exactly the atoms in M are made true by the interpretations of the predicates used in the atoms).  So those clauses do not add any further information and can be dropped.  From an operational perspective, such clauses would give rise to unsatisfiable (in M) subgoals, and so cannot possibly help to prove anything.  Similarly, if a clause contains “not A” where A is not in M, then the literal “not A” is true in M, and hence cannot contribute anything as an antecedent of an implication.  So that is why it is dropped from the clause.

If after performing this simplification the minimal set of atoms satisfying Pi is exactly M, then M is called a stable set for Pi.  If we suppose the atoms in M are exactly the true ones and simplify away all negated literals, then the resulting formula’s minimal model is again M.  So our assumed knowledge M is consistent with what we can deduce from it from the program.  The paper gives examples of programs which have no stable sets, and which have more than one minimal stable set (i.e., a stable set none of whose subsets is stable).

On Friday, we worked through the proof of Theorem 1 of the paper, which states that if M is a stable set of Pi, then it is also a minimal model of Pi.  This led us to consider a pair of propositions (which we did not take from the paper).  Suppose M is a stable set of Pi, and suppose M’ is another set of atoms.

Proposition 1: if M’ satisfies Pi/M, then M’ satisfies Pi.

Proposition 2: if M’ satisfies Pi, then M’ satisfies Pi/M.

We delighted ourselves by managing to refute Proposition 2 and prove Proposition 1.  For the refutation: a counterexample is if M is { p }, M’ is { q }, and Pi is {“p if not q”}.  Pi/M is then just { p }, since we will drop the “not q” from the antecedent of “p if not q”.  So M is stable, since the minimal model of Pi/M is M.  M’ does satisfy Pi, because by making q true, M’ makes “p if not q” vacuously true (since “not q” is false).  But M’ does not satisfy Pi/M, since it does not make “p” true.

For the proof of Proposition 1, we considered an arbitrary clause “A if As and not Bs” in Pi (where I am writing As for a vector of atoms and “not Bs” for a vector of negated atoms Bs).  Suppose that clause does not have a corresponding one in Pi/M.  This means that some Bi must be in M, since the operation Pi/M drops a clause if it contains a literal “not B” where B is in M.  Since M is a stable model, Pi/M must entail Bi.  If not, M would not contain Bi.  Since M’ satisfies Pi/M by assumption for Proposition 2, this means Bi is also in M’, and hence M’ also satisfies the clause we are considering from Pi.

Now suppose that the clause “A if As and not Bs” from Pi does have a corresponding clause in Pi/M.  The clause in Pi/M must be just “A if As”, since the transformation which retains a clause going from Pi to Pi/M just drops all negated literals from the clause.  Since M’ satisfies Pi/M by assumption, it satisfies this clause “A if As”.  Weakening the antecedent by conjoining in some more literals cannot change that fact. So M’ satisfies the original clause “A if As and not Bs” from Pi.  So M’ satisfies every clause in Pi, as required by Proposition 2.

As I, like many others of good will and high aspiration, battle the POPL ’16 submission deadline, I feel compelled to write a quick post about how to get your Agda code through the type-checker faster.  For my submission fight, I have got about 3000 lines of Agda that is intended to be compiled and executed (imagine!), and Agda is getting bogged down.  Compilations are taking productivity-killing minutes instead of seconds, or not completing at all!  Argh.  Here are three things I did that seemed to help (YMMV, of course):

1. Do not use anonymous lets in your code.  Always give a type for let-bound variables.  I had three nested anonymous lets in sequence, and they seemed to be slowing Agda down a lot.
2. Instead of nested tuples, use a new datatype.  Agda seems to have more trouble with (a1 , a2 , a3 , a4) then with (c a1 a2 a3 a4), where c is a new constructor.
3. Sadly, you may be forced to break up big mutually recursive functions.  I have got a doozy going, with maybe 12-15 mutually recursive functions.  Since they all have to go in one file (right?), we get a big Agda file, which seems to give Agda more trouble, generally, than a small Agda file.  The solution is to break up the mutually recursive functions.  To do this, you have to use {-# NO_TERMINATION_CHECK #-} right before the function declarations, and then you have to pass some of those functions explicitly to the other functions, to use for making the recursive calls.  So if f calls g and h, you put f into one file, and add two extra function arguments to f.  These are the functions to use when f wants to call g and wants to call h. Then you put the code for g and h in another file which imports the file containing f.  Where g or h calls f, now instead (f g h) should be called (since f is taking in g and h explicitly as arguments).

Fun fun!

In type theory and logic, predicativity is the property that quantifications do not create objects of the same class as the variable being quantified over.  So one cannot quantify over formulas and get a formula.  Instead, one gets a formula’ — some class distinct from formulas.  In type theory, predicative quantification means that an expression which is quantifying over types is not itself a type (Haskell works like this, which surprised me).  Often (for example, in both Coq and Agda), one sees a countable hierarchy of type universes: a quantification over Type(x) lives in universe Type(x+1).  Impredicativity allows one to stay at the same level.  So in Coq, which has an impredicative universe called Prop at the bottom of the universe hierarchy, quantifications over Prop are still in Prop.  From a type theory perspective, impredicative polymorphism is a quantification of types which still gives you a type, of the same kind that can be used for instantiating such quantifications.  Indeed, it is the possibility of instantiating a quantifier with the entire quantified formula itself that makes analysis of impredicativity challenging.  Challenging, but well understood: Girard discovered the way to prove normalization in the presence of impredicative polymorphism for his System F (see his co-authored book Proofs and Types).

Forget squabbles over comment characters (I do think I prefer Haskell/Agda’s now over OCaml’s — sorry!): people can really tangle over predicativity versus impredicativity.  I have heard second-hand that a well-known type theorist is against impredicativity because he distrusts a trick that can only be used once.  This refers to the fact that we cannot have a layered language with impredicativity in two layers: Girard proved this leads to inconsistency, with his analysis of System U; see the discussion by Coquand here.  This is why in Luo’s Extended Calculus of Constructions (ECC), which is part of the type theory of Coq, only the base level of the universe hierarchy is impredicative, and the other levels are predicative.

I have to say I find this criticism, that we should distrust tricks that can only be used once, unconvincing.  For impredicativity is a heck of a trick.  The best part is that it enables lambda encodings.  I am on a personal quest to rehabilitate lambda encodings, and I am working on new type theories that support dependently typed programming and proving with lambda encodings.  A first step along these lines is my work with Peng Fu on self types, which you can read about here.  This is not the place to go into all the details, but impredicativity is absolutely essential.  With only predicative quantification, there really is no way to make lambda encodings practical.

Furthermore, if impredicativity is the trick that can only be used once (and hey, let’s use it), predicativity is the trick you have to keep using again and again and again.  To avoid repeating code and datatype definitions at each level of the hierarchy (which one quickly finds will be needed for a lot of practical examples), we have to resort to level polymorphism.  Now a level-polymorphic type is not in any level we can access.  Why not just extend to higher ordinals?  Oh boy.  And of course, we have to solve level constraints that the type system imposes.  This complicates type checking.

So predicativity is out for me, including ECC-style predicative hierarchies over impredicative base levels.  But impredicativity is not getting off so easily.  Not for funny philosophical reasons or because we have to use it carefully — and we are treading the line here, as the more expressive we make our logic the closer we skirt the edge of inconsistency — but because it too has its own peculiar restrictions.  I have just been learning about these, and indeed, they are the impetus for writing this post.  Did you know that you cannot perform large eliminations on so-called impredicative datatypes in Coq?  And that these datatypes are the reason that Coq enforces strict positivity for datatype declarations (the datatype cannot be used to the left of an arrow, even if to the left of an even number of arrows, in the type for an argument to a constructor of that datatype)?  Well, if you are reading this post you probably did.  But I did not.  The counterexample is from a 1988 conference paper by Coquand and Paulin-Mohring.  Their presentation is a bit hard to read through for me, anyway, but thankfully Vilhelm Sjöberg transcribed it into Coq, with helpful comments, here.

Reading through this example has been challenging, not just because it is tricky and I still did not manage to get a great intuitive grasp of how it works.  But also because I have been worried it might apply to the type theory I am developing as the next step in the work on dependently typed lambda encodings.  And that would be very troubling, as I have a 20-page proof of consistency of that system!  Did I mess up my proof?  What features of ECC + inductive types are actually needed for that example of Coquand and Paulin-Mohring’s?  As Vilhelm wrote it, the example does use the predicative hierarchy of ECC, not just Prop.  I am not 100% convinced that it could not be carried out in Prop alone, though.

After much puzzling, I think I understand why this example would not affect the system I am working on, which allows impredicative nonstrictly positive datatypes, and for which I believe I will soon have an extension with large eliminations.  This sounds miraculous, because Coq forbids nonstrictly positive datatypes, and also forbids large eliminations of impredicative datatypes.  I am saying you could have both simultaneously and be consistent.  That must be wrong, right?  Well, wrong (maybe).  The difference is that I am working in a Curry-style type theory, similar to the Implicit Calculus of Constructions (ICC) of Miquel (with several additional features for dependently typed lambda encodings).  As an aside, while I have the highest admiration for Miquel’s work, his system has added several strange typing rules that prevent a reasonable metatheory and violate a number of aesthetic guidelines for type theory.  My system does not follow Miquel on those points — though we do not yet have the definable subtyping which is the most intriguing practical point of ICC, albeit to my knowledge not yet capitalized on in any system or even other papers.  In the system I am working on, terms are simply the terms of pure untyped lambda calculus.  All the action is in the types we can assign to such terms, and the kinding of those types.  For practical purposes, one will eventually need to design an annotated version of such a language, since inferring a type for a pure term of lambda calculus, even just in System F (a subsystem of my system), is undecidable.  But for metatheoretic analysis, this Curry-style development, where terms are completely unannotated, is fine.  In fact, it is more than fine, it is enlightening.  ECC is based on a Church-style approach to the syntax of terms.  Types are really legitimate parts of terms, that could be computed at run-time (though nothing interesting at run-time could be done with them).  This, by the way, is another point of ICC that I do not agree with: types can be parts of terms “really”, even though they do not have to be.

I am locating the central issue with the Coquand/Mohring counterexample to large eliminations with nonstrictly positive inductive types in the fact that the element of the inductive type in question can store a type (actually, something of type $(A \rightarrow \star) \rightarrow \star$).  This type is really like a piece of data inside the element of the inductive type.  It can be retrieved and processed, and this leads to the paradox.  Rather than blocking large eliminations with inductive types, I propose to shift the ground completely by using a Curry-style theory.  So types cannot be stored inside elements of datatypes, and thus cannot be retrieved and manipulated.  I believe this would block that counterexample, while still allowing both nonstrictly positive inductive types and large eliminations over impredicative inductive types.  I have a proof that this works for the first of these features, and I believe I shall have a proof for the second (large eliminations) in the next few months.

Hoping to have stirred the pot, I wish anyone reading to this point all the best in their investigations of type theory, or whatever other subjects deprive them of sleep and fire their minds.

Most QA9 readers will know the nonconstructive proof that there are two irrational numbers a and b such that a to the power b is rational.  There is interesting discussion about this here.  I recently came across a 2014 note by Roger Hindley about the history of how this example came to be more broadly known.  It is interesting reading.  Hindley credits a 1953 paper by Dov Jarden in Scripta Mathematica (volume 19, page 229) with the proof.  In case you do not have access to this journal in a local library, I have decided to include a scanned copy of the article below:

It’s a Math tweet from 1953.

In our Computational Logic seminar here at The University of Iowa, we are studying logic programming this semester.  We are using the very nice book “Logic, Programming, and Prolog”, freely available online.  We were talking today about the existence of a least Herbrand model for a definite program.  A definite program is just a set of clauses of the form $A_0 \leftarrow A_1,\ldots,A_m$, where each $A_i$ is an atomic formula (predicate applied to terms).  (Free variables in clauses are interpreted universally.)  If $m = 0$, then we just have an atomic fact $A_0$ in the definite program.  A Herbrand interpretation is a first-order structure where each function symbol $f$ of arity $k$ is interpreted as $\lambda x_1,\ldots,x_k. f(x_1,\ldots,x_k)$, and each predicate is interpreted as a subset of the set of ground (i.e., variable-free) atomic formulas.  A Herbrand model of a definite program P is then just a Herbrand interpretation which satisfies every clause in P.  It will be convenient below to identify a Herbrand interpretation with a subset of the set of all ground atomic formulas.  Such a subset determines the meanings of the predicate symbols by showing for which tuples of ground terms they hold.  We will pass tacitly between the view of a Herbrand interpretation as a first-order structure and the view of it as a set of ground atomic formulas.  The Herbrand base is the Herbrand interpretation corresponding to the set of all ground atomic formulas.  It says that everything is true.

What I want to talk about briefly in this post is the fact that the set of Herbrand models  of definite program P forms a complete partial order, where the ordering is the subset relation, the greatest element is the Herbrand base, and the greatest lower bound of a non-empty subset S of Herbrand models of P is the intersection of all the models in S.  In a complete partial order, every subset S of elements should have a greatest lower bound (though it need not lie in S).  Alternatively — and what I am interested in for this post — we can stipulate that every subset S should have a least upper bound.  The two formulations are equivalent, and the proof is written out below.  “Logic, Programming, and Prolog” contains a simple elegant proof of the fact that the intersection of a non-empty set of Herbrand models is itself a Herbrand model.

What I want to record here is the proof that in general, if in a partial order $(X,\sqsubseteq)$ every subset $S\subseteq X$ (including the empty set) has a greatest lower bound, then every such $S$ also has a least upper bound.  The proof I have seen for this is a one-liner in Crole’s “Categories for Types”.  It took me some puzzling to understand, so I am writing it here as much for my own memory as for the possible interest of others, including others from the seminar who watched me fumble with the proof today!

Let $S$ be a subset of $X$.  Let $\textit{ub}(S)$ be the set of elements which are upper bounds of $S$ (that is, the set of elements $u$ which are greater than or equal to every element of $S$).  The claim is that the greatest lower bound of $\textit{ub}(S)$ is the least upper bound of $S$.  By the assumption that every subset of $X$ has a greatest lower bound, we know that there really is some element $q$ which is the greatest lower bound of $\textit{ub}(S)$.  As such, $q$ is greater than or equal to every other lower bound of $\textit{ub}(S)$.  Now here is a funny thing.  Every element $x$ of $S$ is a lower bound of $\textit{ub}(S)$.  Because if $y\in \textit{ub}(S)$, this means that $y$ is greater than or equal to every element in $S$.  In particular, it is greater than or equal to $x$.  Since this is true for every $y\in \textit{ub}(S)$, we see that $x$ is a lower bound of $\textit{ub}(S)$.  But $q$ is the greatest of all such lower bounds by construction, so it is greater than or equal to the lower bound $x$.   And since this is true for all $x\in S$, we see that $q$ is an upper bound of all those elements, and hence an upper bound of $S$.  We just have to prove now that it is the least of all the upper bounds of $S$.  Suppose $u'$ is another upper bound of $S$.  This means $u'\in\textit{ub}(S)$.  Since by construction $q$ is a lower bound of $\textit{ub}(S)$, this means that $q \sqsubseteq u'$, as required to show that $q$ is the least of all the upper bounds of $S$.

The final interesting thing to note about the complete partial order of Herbrand models of a definite program P is that while the greatest lower bound of a non-empty set $S$ of models is their intersection, and while the greatest element is the Herbrand base (a universal Herbrand model), the intuitive duals of these operations are not the least element nor the least upper bound operation.  The intuitive dual of a universal Herbrand model would be, presumably, the empty Herbrand interpretation.  But this need not be a model at all.  For example, the definite program P could contain an atomic fact like $p(a)$, and then the empty Herbrand interpretation would not sastisfy that fact.  Furthermore, if $S$ is a non-empty set of Herbrand models, $\bigcup S$ is not the least upper bound of $S$.  That is because $\bigcup S$ need not be a Herbrand model of P at all.  Here is a simple example.  Suppose P is the definite program consisting of clauses $\textit{ok}(h(a,b))$ and $\textit{ok}(h(x,y)) \leftarrow \textit{ok}(x),\textit{ok}(y)$.  Consider the following two Herbrand models $H_1$ and $H_2$ of this program P. In $H_1$ the interpretation of $\textit{ok}$ contains all the terms built using $h$ from $a$ and $h(a,b)$.  In $H_2$, the interpretation of $\textit{ok}$ contains all the terms built using $h$ from $b$ and $h(a,b)$.  If we take the intersection of $H_1$ and $H_2$, then it is a Herbrand model, in fact the minimal one: it says that $\textit{ok}(h(a,b))$ is true, as required by the first clause in P; and if two terms $t_1$ and $t_2$ are in the interpretation of $\textit{ok}$, then so is $h(t_1,t_2)$.  But if we take the union of $H_1$ and $H_2$, what we get is not a Herbrand model of P at all.  Because $H_1 \cup H_2$ contains $\textit{ok}(h(a,a))$ and $\textit{ok}(h(b,b))$, for example, but not $\textit{ok}(h(h(a,a),h(b,b)))$.  To get an upper bound of $H_1$ and $H_2$, it is not enough to take their union.  One must take their union and then close them under the deductive consequences of the program P.  That’s the intuition, though we would need to formally define closure under deductive consequences — and it would be a bit nicer to be able to apply a model-theoretic notion (since we are working model-theoretically here) rather than a proof-theoretic one.   Declaratively, we know we can get the least upper bound of a set $S$ of Herbrand models as the intersection of the set of all Herbrand models which are supersets of every model in $S$.  But this is rather a hard definition to work with.

Anyhow, this is a nice example of finding an interesting abstract structure in semantics, as well as a good exercise in reasoning about such structures.