## Category Archives: Trellys

Datatypes are a conundrum for designers of type theories and advanced programming languages. For doing formal reasoning, we rely heavily on datatype induction, and so datatypes seem to be essential to the language design. And they are a terrible nuisance to describe formally, adding a hefty suitcase to the existing baggage we are already coping with. For example, in the “Sep3” Trellys design we have been developing for the past 12 months now, we are contending with a programming language with Type:Type and dependent types, plus a distinct predicative higher-order logic, with syntax for proofs, formulas, and predicates. This language is getting a bit big! And then one has to describe datatypes, and you need an Ott magician to make it all work (fortunately we do have one, but still).  What a hassle!

Can’t we just get rid of those darn datatypes?  The Pi-Sigma language of Altenkirch et al. and recent proposals of McBride et al. are out to do this, or at least minimize what one needs to accept as a primitive datatype.  In this post, I want to talk about the problems you encounter if you lambda-encode your datatypes, and why I think we might be able to get around these in Sep3 Trellys.

There are two terrible snarls that we face with lambda-encoding datatypes, and they both have to do with eliminations (that is, with the ways we are allowed to use data that we have or assume have already constructed).  The first snarl is multi-level elimination.   Suppose you want to allow the construction of data by recursion on other data (standard), together with construction of types by recursion on data (exotic: “large eliminations”).  Then you have a big problem right away with using lambda-encoded data, because lambda-encoding with either the Scott or Church encoding requires fixing a level for elimination of your data.  Let me say more about this, looking for the moment just at the well-known Church encoding.  In this encoding, a numeral like 2 is encoded as $\lambda s. \lambda z. s (s z)$.  I am writing this Curry-style (without typing annotations) in the hopes of making it more readable.  The type you want to assign to this term is then $\forall A:\textit{Type}. (A \rightarrow A) \rightarrow A \rightarrow A$.   The numeral 2 has been encoded as its own iterator: given a function $s$ of type $A \rightarrow A$ for some type $A$, and a starting point $z$ of type $A$, the numeral 2 will apply $s$ twice starting with $z$.  The problem with this encoding is that it is forced to pick a level of the language for $A$.  Here, we said that $A$ is a type, so values computed by 2 must be at the term level of the language.  So we cannot use this encoding of 2 if we want to compute at the type level.  For that, we would need the type $\forall A:\textit{Kind}. (A \rightarrow A) \rightarrow A \rightarrow A$, and that is quite bad since if we are going to have impredicative quantification over kinds as well as types, we will lose normalization (I don’t have a precise reference for this, but heard it through the type theory grape vine, attributed to Thorsten Altenkirch, that impredicativity is a trick we  can only play once.)   Anyhow, even if we had such impredicativity, we would need to assign multiple types to this encoded 2, and that sounds very worrisome: users are carefully crafting annotations to show a single typing, and it would become quite wild, a priori, to be crafting annotations to achieve multiple typings.

The second snarl is dependent elimination.   If we want to use Church-encoded 2 for writing dependently typed functions, or (as is the same under the Curry-Howard isomorphism) proving theorems, the simple polymorphic type we just considered is not good enough.  From an encoded natural number $n$, we want to produce data of type $C\ n$, for some type constructor $C$.  So instead of $\forall A. (A \rightarrow A) \rightarrow A \rightarrow A$, we really want something like $\forall C. (\forall n. (C n) \rightarrow (C (n+1)) \rightarrow (C\ 0) \rightarrow \forall n. (C\ n)$.  But this looks problematic, as we are trying to define the $\textit{nat}$ type itself!  What types can we possibly give to $C$ and $n$ in that definition?  Their types need to use $\textit{nat}$ itself, which we are trying to define.  Now we could use recursive types to try to get around that problem, but the situation is actually even worse: the pseudo-type I wrote is supposed to be something like the definition for $\textit{nat}$.  That is, we would expect our encoding of 2 to have that type.  But then we see that even as a pseudo-type, it does not make sense.  We actually want 2 to have the type $\forall C. (\forall n. (C n) \rightarrow (C (n+1)) \rightarrow (C\ 0) \rightarrow (C\ 2)$.  That is, its type should mention 2 itself!  In general, we want a natural number $N$ to have type $\forall C. (\forall n. (C n) \rightarrow (C (n+1)) \rightarrow (C\ 0) \rightarrow (C\ N)$.  How can something’s type mention the thing itself?  This reminds me of very dependent function types (see Section 3 of Formal Objects in Type Theory Using Very Dependent Types by Jason Hickey, for example).  It seems pretty exotic.

Now in the context of our Sep3 Trellys design, there is actually a little light here through all these complex difficulties for lambda encodings.  First of all, we have Type : Type in the programming part of Sep3.  So we could hope that the problem with multi-level elimination would go away.  It actually doesn’t with our current design, because we have adopted a predicative higher-order logic on the logic side, which has actually forced us to add a universe hierarchy on the programming side.  So we have Type 0 : Type 0, and then Type 0 : Type 1 : Type 2 …  Now it occurs to me that all this trouble would just go away, and the language design would get much simpler, if we just bit the bullet and adopted an impredicative higher-order logic.  I immediately lose a lot of confidence that our logical language would be normalizing — but that’s what we do careful metatheoretic study for.  There’s no reason to assume that impredicativity will cost us consistency, so let’s not be pessimistic.

If we do not stratify the logical side of the language, we need not stratify the programming side, and so we won’t have multiple levels at which to eliminate an encoded piece of data.  This would be great, as it would (almost) knock down the first objection to lambda-encoding datatypes.  There’s another issue one would have to worry about, though, and that is eliminating data in proofs (since Sep3 distinguishes proofs and terms).  I am going on a bit too long, and my thinking isn’t settled yet, but there’s one very nice thing we could hope for about eliminating data in proofs.  Instead of using the Church encoding, we could use the Scott encoding.  Here, 2 is encoded as $\lambda s. \lambda z. s\ 1$.  Data are encoded as their own case-statements, rather than as their own iterators.  Why is this relevant?  Well, I have to note first that now we really will need recursive types of some kind for this, because intuitively, we want the (simple) type for Scott encoded 2 to be $\forall A.(\textit{nat} \rightarrow A) \rightarrow A \rightarrow A$.  Note the reference to $\textit{nat}$ in the type assigned for $s$.  We need to use a recursive type to allow $\textit{nat}$ to be defined in terms of itself like this.  But here’s the cool observation: Garrin Kimmell and Nathan Collins proposed for Trellys (both Sep3 and U. Penn. Trellys) that we base induction on an explicit structural subterm ordering in the language.  We have an atomic formula $a < b$ for expressing that the value of terminating $a$ is structurally smaller than the value of terminating $b$.  The obvious but beautiful observation is that with the Scott encoding, we really do have $1 < 2$.  Note that this is not true with the Church encoding, since $\lambda s. \lambda z. s\ z$ is not a subterm of $\lambda s. \lambda z. s\ (s\ z)$.

Ok, I’ve gone on too long and haven’t gotten to address the problem with dependent elimination yet.  But suffice it to say that lambda-encoding datatypes, which was always eminently attractive, is starting to look like a possibility for Sep3.

You could be forgiven for not knowing it, but we — at U. Iowa, U. Penn, and Portland State — have been hard at work now for almost two years on the language design for Trellys.  Two years seems like a long time — and I very much hope that we’re nearing completion of this language — but when you think about how long it has taken the Type Theory community to make each of its various advances, two years isn’t really much.  One can look at the challenge we are trying to meet in two ways:

1. extending the Curry-Howard isomorphism to support general recursion in a convenient way, while retaining a logically sound fragment of the language.
2. extending a logical theory of evaluation for a richly typed programming language so that programs can contain proofs from the logic.

We are concurrently working on two language designs, one of which is primarily aimed at (1), and the other at (2).  The U. Penn. team is primarily trying to extend the Curry-Howard isomorphism to accommodate general recursion, using a design internally called NoLaw (as in “Congress shall make no law … abridging the freedom of speech”); while we at Iowa are focused on combining a separate logical theory of program evaluation with dependency of programs on proofs, in a language we rather infelicitously call Sep-pp (or Sep3, for separation of proofs and programs).

Both the language designs we are exploring right now distinguish logical and programmatic fragments of the language.  The goal is to be able to prove normalization for terms in the logical fragment, while allowing the possibility of diverging computations in the programmatic fragment.  In Sep3, the big challenge (recently picked up also for the NoLaw design) is to allow programs to contain proofs.  This might sound innocuous, but think about what this means from the point of view of normalization.  Suppose you have a proof about a program that contains a proof.  For a small artificial example: maybe you are doing a case split in a proof on the form of the value of some programmatic term t of the following type:

data pf_holder : Type where
mk_pf_holder : (F:Formula) -> (p:F) -> pf_holder


This datatype is maybe not that useful in practice, but we certainly want to allow datatypes like it, where term constructors can accept proofs of unknown formulas F as arguments.

So imagine you are doing a case analysis inside a bigger proof P on the value of a programmatic term t of type pf_holder. Now, since some programmatic terms fail to have a value, in Sep3 we are insisting that to do a case analysis in a proof, you must supply a proof that the scrutinee (t in this case) is terminating. In Sep3, we have a primitive atomic formula t ! to express that t terminates. So suppose we have a proof that t terminates, and we do this case analysis. Then what is really happening is from the land of safety — this is the logical fragment, where all terms are safe in the sense that we are guaranteed that they normalize (this is the goal anyway) — we reach into the land of danger and potential divergence — the programmatic fragment, where this term t lives — and pull out from it a safe term again, which would be in this case the subproof p for which we have t = mk_pf_holder F p.  And now we can potentially use this safe term p inside of our bigger proof P.

It is this back and forth between programs and proofs, and particularly this scenario where a proof can reach into a program and pull out another proof and use it, that makes this language design quite challenging.  Guru also separates proofs and programs in a similar way as Sep3, but does not provide general support for this feature of taking proofs out of programs and using them in proofs.  So I think this is something really new.

I’ve taken a bit longer than I had hoped to set the stage for what I want to write about today, which is how to accommodate a predicative universe hierarchy in this setup.  I’ve generally wanted to avoid universe hierarchies, because, as Conor McBride wrote about a little while back, they do seem to involve (at least with our current knowledge) some irritating, rather inelegant mechanisms.  I definitely don’t have anything to contribute at the moment for trying to improve matters there.  But consider this situation.  In Sep3, we are proposing to have the logical fragment be a predicative higher-order logic with a form of structural induction.  Why take a predicative logic?  Because the meta-theory will be tremendously easier, and because since our intended application is practical program verification, having the profound proof-theoretic strength that comes from an impredicative logic is quite unnecessary.  We want to use this language to prove algebraic properties of list combinators, not consistency of strong logical theories.

So if we’re adopting a predicative higher-order logic, that means, briefly, that we stratify formulas (in our logical fragment) into levels, where a quantification over formulas of level i is forced to be itself at level at least i+1.  So a formula like (F:Formula_0) -> F, for a small example, is itself classified by Formula_1, because we need it to be up a level from the level of the type of its quantified variable. This does not require any great innovation, as this sort of idea is relatively well understood (at least enough to get something reasonably workable, as Conor was posting about).

What becomes tricky in our setting is how this universe hierarchy in the logical fragment affects the typing for the programmatic fragment. Because we have the possibility in Sep3 of pulling proofs out of data inside bigger proofs, when we go to do our normalization proof for the logical fragment (to prove it is consistent), we need a deep interpretation of programmatic datatypes. In the middle of defining our interpretation of types using reducibility, we will need to state that the interpretation of the type pf_holder is, roughly, the set of terms of the form mk_pf_holder F p, where p is in the interpretation of F. At this point, if we are not careful, we will lose well-foundedness of the definition of the interpretation of types, since there is no syntactic relationship whatsoever between F and pf_holder, so we cannot see that anything is decreasing where we appeal to the interpretation of F while defining the interpretation of pf_holder.

There are a number of technical issues here, and we certainly haven’t worked this all out yet. But part of our current idea is that we are going to have to stratify Type, on the programmatic side, as well. So we will write:

data pf_holder : Type_1 where
mk_pf_holder : (F:Formula_0) -> (p:F) -> pf_holder


We will put pf_holder up one level, so that when we need to appeal to the interpretation of F, we can go from level 1 to level 0. This means that the interpretation of formulas and types will be indexed also by the level of the formula or type in question.

This leads finally to where I wanted to get to for this post. We have a predicative hierarchy of formulas Formula_0, Formula_1, ... on the logical side, and also on the programmatic side: Type_0 : Type_1 : Type_2 .... (Note that we are not making Formula_0 : Formula_1, since intuitively, Formula_i does not assert anything, and hence should not itself be considered a formula; we have other technical mechanisms for enforcing the hierarchy besides just making Formula_i : Formula_i+1.) But the really interesting thing is that for Type_0, we can allow very dangerous principles. In particular, we are going to make Type_0 : Type_0. It’s been one of our goals for Trellys all along to have Type : Type for the programmatic fragment, since we don’t need that to be sound anyway (and it is known that adding Type : Type can lead to diverging terms). It is ok to put that principle at the lowest level of our programmatic hierarchy, because we do not need a deep interpretation at level 0 for programmatic types. This is because, by the way we have constructed the language, programmatic datatypes at level 0 cannot contain proofs of quantified or unknown formulas. They could contain proofs of known unquantified formulas, but that poses no problem for well-foundedness of our interpretation of types: we just make datatypes bigger than unquantified formulas in our well-founded ordering, and there is no circularity. The interpretation of unquantified formulas cannot possibly depend back on the interpretation of datatypes.

So that’s the picture: predicative hierarchies for Formula and Type, where we have the highly impredicative principle Type_0 : Type_0 at the lowest level. This is pleasantly and unexpectedly similar to the situation in Coq, say, where we have impredicative Prop at the lowest level, and then a predicative hierarchy Type_0 : Type_1 : ...  above Prop.

As a final note, just to attempt to ward off misconceptions: this similarity does not at all mean that we are inheriting Coq’s artificial distinction between Set and Prop. In Sep3 our distinction between Formula and Type is based on the profound difference between a guaranteed terminating proof — indeed, possibly a nonconstructive one, since we are including the ability to case-split on whether or not a programmatic term terminates — and a general-recursive program.