The omega-rule of lambda calculus is the principle that if for all closed terms t we have t1 t beta,eta-equal to t2 t, then t1 is beta,eta-equal to t2. This principle is not valid for all t1 and t2, due to the existence of so-called universal generators, which Gordon Plotkin used in a 1974 paper to construct a counter-example to the omega-rule.

Working on Relational Type Theory (RelTT), I found I would really like a form of the rule, though, which happily is derivable. Suppose that t1 and t2 are closed normalizing terms, and that t1 t =beta t2 t for all closed t. Then indeed t1 =beta t2. The proof is easy. By the assumption about t1 and t2, we have that they are beta-equal to some terms $\lambda\,x.t'_1$ and $\lambda\,x.t'_2$, respectively, with t1′ and t2′ normal. So we have that for any closed t, [t/x]t1′ =beta [t/x]t2′. Instantiate this with $\Omega$, which is $(\lambda\,x.x\ x)\ (\lambda\,x.x\ x)$, and write c1 for [Omega/x]t1′, and c2 for [Omega/x]t2′. Suppose that x is not free in t1′. Then we have t1′ =beta c2. By the Church-Rosser property of untyped lambda calculus, this implies that c2 beta-reduces to t1′, since t1′ is normal. But this can only happen if x is not free in t1′ and t1′ is identical to t2′. Otherwise, c2 would lack any normal form.

Now suppose that x is free in t1′ and in t2′. Again appealing to Church-Rosser, c1 and c2 must both beta-reduce to some common term q (since they are beta-equal). But the only term c1 reduces to is itself, and similarly c2, so they must be identical. By a simple induction on t1′, the fact that c1 is identical to c2 must imply that t1′ is identical to t2′. The crucial point is that in the case where t1′ is x, then t2′ must also be x, because otherwise t2′ would have to be Omega, but this is ruled out because t2′ is normal. This is why the restriction to normalizing t1 and t2 is critical: it allows us to choose Omega as a term that behaves like a fresh variable when substituted into t1′ and t2′. It cannot equal any term that is already in t1′ and t2′. This is a hint of why universal generators — which as near as I can understand are diverging terms that can reach a term containing t for any term t — can be a counterexample to the omega-rule: we cannot find any term that we can safely use as a stand-in for a variable, because all terms are generated.

I wanted to record this proof here, because the proof in Barendregt is quite involved, running for page after difficult page. It covers a much more general (and apparently much harder) situation, where t1 and t2 need not be normalizing. But for many applications in type theory, we have normalizing terms, and there this simple justification for the omega rule can be used.

Well, when you are in one, you cannot get out — and you cannot even tell there is any out to get to. Take our physical universe, for example. We cannot leave it (physically, certainly), of course. And we cannot even perceive there is an end to it (as space is expanding faster than the speed of light). So, it just is all there is, as far as we can physically observe. Amazingly, it is through reason that we can deduce that there is something beyond space, not through observation (although personally I think the big bang is pretty strong observational evidence for extra-universal reality).

In mathematics, we can thinks of sets closed under operations as universes. Take the set of natural numbers. There is no way to get out of this set just using basic operations like zero, successor, arithmetic operations, etc. Or take some greater ordinals, like epsilon-0 (which is the first fixed-point of the ordinal exponentiation function raising omega to a power). Within epsilon-0 (and ordinals are sets, so we can imagine being in it), you can add ordinals and exponentiate omega by them, and you will not leave the universe of epsilon-0. One needs disruptive technology — like forming the set of all unary functions on the naturals, or taking the fixed-point of a continuous function on the ordinals — to blow out of the universe.

Being in a universe means it is hard to imagine you are missing anything. You are like a fish in the ocean. Land, what’s land? Everything is just water. Keep your tall tales!

Now, I raise all this because I have recently had occasion to program comparatively in Cedille (the proof assistant I and my group have developed over the past few years), and Coq. Coq is an amazing tool, with many extraordinary accomplishments and results to its name. Cedille is just a baby tool in comparison. But Cedille’s type theory provides something that to me is closer to a universe. There are natural things you can do, and you just do them uninhibitedly. Ok, it is true that you cannot form negative datatypes. That is hitch, that makes it feel quite a bit less like a universe. In fact, I am working on getting rid of that restriction, in relational type theory (a new system I am working on). But in Coq, there are just so many more pointy edges. Make sure that datatype is strictly positive, not just positive! Be careful you do not perform large eliminations over large-indexed types! (Huh?) The latter caused me a bunch of headaches recently. Don’t recurse on anything unless it is a structural subexpression of a pattern. Picky, picky. Just check out the error list and see if it feels a bit spiny.

But, you will say (putting on your defender-of-Coq hat): those restrictions are all there to prevent things like logical inconsistencies that can arise without them. To which I respond: just the same way traffic lights and stop signs are there to prevent accidents. We need those. But there are no one-way streets in the ocean. You just swim. There are currents, ok. But nothing stops you.

My point is not to put down any tool, especially one as awesome as Coq. My point is to draw attention to a goal I think we should have as type-theory designers and implementers. The type theory must of necessity have some limitations. This is fundamental, due to Goedel’s incompleteness theorem, for one reason. But working within the theory, it should be hard to imagine its limitations. It should feel like you can do anything. You can use the tools the theory provides in any way you want. This is not true in Coq (or Cedille either, given the positivity restriction for datatypes). It feels more like you are a citizen in a modern democracy. You can do a lot of things. But not just anything. I can take a walk. That is ok. I cannot take a walk through my neighbor’s house uninvited. That is not allowed. There are, for very good reasons, loads and loads of rules and restrictions on what you can do. In the human world, we need rules and laws. But in the formal world, we would love to have total freedom.

So let’s strive to design type theories that feel like universes. If we do not not, we will suffer the fate we deserve: to mistake our tangle of ordinances for the ocean.

Break times for me are often times to do some coding, and this summer was no exception. I decided to start working on a new version of Cedile, dubbed Cedille Omega because the change to the core type theory is so large as to warrant skipping an infinite number of version numbers from Cedille 1. Ha. Naturally, I was expecting to have something working by the end of the summer, and I do! I can correctly process comments.

Oh my. Did I really spend a month or so on comments? Well, my work time is in a vise due to childcare under the pandemic, but still that seems extravagant. And there is more than just comment support, because I set up an emacs mode communicating with a backend written in Haskell, from scratch. Some things I did not know too well how to do, like set up a bigger Haskell codebase with cabal (with my own nested modules, for example). This was not hard but engineering takes time. The elisp side is more “interesting” because of dynamic typing and elisp weirdness.

But to comments: in Cedille Omega, there are three kinds of comments:

• comment to end of line
• comment to start of line
• nestable comment

Comment to end of line begins with –, as in Haskell. It can also begin with -., to match comment to start of line, which ends with .- and is thus the mirror image. Nestable comments are with {- and -} as in Haskell.  I have not seen comment to start of line, but it seems like an obvious complement to comment to end of line.

I implemented, in the Haskell backend, what I am calling “blanking semantics” for comments. Blanking means to replace the whole comment, including start- and end-comment characters, with spaces. This is handy for indentation-sensitive parsing of non-comment text. There are some conflicts between the three kinds of comments, which are resolved by applying blanking to the different types of comment in order.  So first we blank comments to end of line.  This will erase start- and end-comment characters for all types of comments.  So if you have a line like

hi — there .-

the — will win out over the .-, and blanking will leave only the “hi”.

Finally, dangling start- or end-characters for nestable comments (which can indeed be nested, as in Haskell) comment to end of file or beginning of file, respectively.

This semantics was easy to implement in three passes in Haskell.  Because blanking comments to start of line seemed easier with the string reversed, my code just uses one core function to blank comments to end of line and reverse the string as a result.  So you blank -. comments getting a reversed string, then blank -. again which has the desired effect of blanking comments to start of line.  Then a recursive function blanks nestable comments still standing.  A little extra logic is needed to watch for dangling end-characters for nestable comments.

During blanking, the code builds up character ranges for the different kinds of comments.  The backend then communicates these to the front end, also indicating which kind of comment it is.  The latter information is useful for setting stickiness of text properties in emacs.  You can assign characters text properties, which are retained as text is manipulated (moved, copied, etc.).  I decided to try using the category property, which gives a layer of indirection: you assign the properties you want (like face, which controls the display of the character) to an elisp symbol, and then make that symbol the value of the category property for a piece of text.  Then all the properties you bound to the symbol are applied to the text.  Nifty.  All comments get the same face for comment highlighting.  But I made different categories for the different kinds of comments, so I could set the stickiness of text properties differently for them.  Text properties in emacs can be front- or rear-sticky, which means that they are inherited by new characters inserted just before or just after, respectively, the text with the properties.  Comments to end of line should be rear sticky but not front sticky, and vice versa for comments to beginning of line.  Nestable comments should be front and rear sticky inside the delimiters, but start-characters should not be front sticky and end characters should not be rear-sticky.  So when you type new text right next to a comment , the comment highlighting is inherited or not, appropriately.

Just posting this in case you, too, care to comment.

A little while ago, I posted a challenge to come up with examples where optimal beta-reduction would reduce a term efficiently, but call-by-need evaluation in Haskell (with optimizations on) would be inefficient. A little while ago? Wow, it was last summer! Well, summer time for me has become time to think about optimal beta-reduction. What a hobby, right? And the semester is ending here at U. Iowa, so here it goes again.

Anyhow, I have a little bit of a new challenge — and this time a solution to it (instead of needing help to find a solution: thank you friends!). The examples I got last time (from Victor Maia, many thanks) were all ones that reduce efficiently with optimal beta, but not with call-by-need. You can watch Haskell grind and grind on examples that are instantaneous with optimal beta. But these examples are — or should be; more on that shortly — efficiently executable with normal-order reduction [edit: actually the strategy is a little more than that; see below]. This means that theorem provers like Coq and Agda should be able to reduce them efficiently.

Ok, so the new challenge is to find an example that is inefficient with normal-order reduction (and hence would certainly also be in Haskell), but would be efficient with optimal beta. In a sense Victor already solved this challenge with his very inspiring Medium posts of a couple years back (like this one), as these are examples that execute asymptotically much faster with optimal beta than with call-by-need, and normal-order reduction does not help as they are actually not higher order. But I am hoping for a little simpler example, exhibiting exponential behavior under normal-order reduction. Also, I would like to compare with normal-order reduction explicitly.

The solution (suggested by Victor) to my first challenge is this one:

busy :: Int -> Int -> Int
busy 0 = \ x -> x
busy n = \ x ->
let f = busy (n - 1)
in f (f x)


Asking ghc to evaluate something like busy 34 0 will take a while; call-by-need evaluation is exponential in n. But busy n is always just the identity function, and optimal-beta will take take time linear in n. But (and this realization is what spurred my interest in the new challenge), normal-order evaluation will also be linear in n, because we will reduce busy (n – 1) to the identity function, and then f (f x) reduces in just two beta-reductions to x. [edit: for this to work, we need to use a version of normal-order reduction that evaluates let-terms strictly; let x = t1 in t2 needs t1 to be evaluated first, then t2]

I tried this out in Cedille, which implements [this version of] normal-order evaluation, and sure enough, the runtime is extremely fast, linear in n. Our interpreter is very naive, so evaluating busy 64 would definitely explode if evaluation were exponential — but it completes in a second or two.

Now what surprised me is that this example takes exponential time in both Coq and Agda! Weird, right? Why is that? I am puzzled, and emailed the Agda list to see if anyone has any ideas. [edit: Agda inlines let-definitions, causing the example to blow up.]

But here is a modified version of the example that makes Cedille choke (and certainly Coq, Agda, and I confirmed Haskell as well):

type Id = forall a . a -> a

busy :: Int -> Id -> Id
busy 0 = \ x -> x x
busy n = \ x ->
let f = busy (n - 1)
in f (f x)


Evaluating busy n builds a lambda term which takes in x and then builds a balanced binary tree of depth O(n) of applications of x. For example, Cedille’s normal-order evaluator will show this result for busy 2:

λ x . x x (x x) (x x (x x)) (x x (x x) (x x (x x)))

Here we see a nice example of the power of optimal-beta, because optimal-beta will represent this exponential-sized term in linear space! And applying busy n to the identity function will reduce in linear time. The key point about this example beyond the first version of busy, is that it copies what the Asperti and Guerrini book call “virtual redexes”, namely applications of variables. Those variables could become instantiated later, creating redexes, and we better not copy any redexes (ever!) if we want to be optimal. So copying a virtual redex could lead to non-optimality.

On many PL researchers’ lists of most important ever papers in PL is “Types, Abstraction, and Parametric Polymorphism” by the great John C. Reynolds.  I have been studying the paper rather intensively the past month or so, and would like to share some thoughts (and even a legend for the paper’s notation, at the end of the post!).

In this paper, Reynolds sets out to characterize mathematically what parametric polymorphism is.  Part of the idea is the familiar goal (in Computer Science) that different implementations of the same interface should be interchangeable: as long as they behave the same with respect to the interface, then no other piece of code should be able to tell the difference between them.  Another part of the idea is that type polymorphism can be used to enforce this, as different clients of some interface cannot discover details about the implementation that are hidden behind some type abstraction.

To move these ideas from the realm of intuition to theorems, Reynolds defines two semantics for types.  The first interprets types as sets.  For example, a type like $\textit{Nat} \to \textit{Nat}$ is interpreted as the set of all set-theoretic (mathematical) functions from natural numbers to natural numbers (assuming the type Nat gets interpreted as the set of natural numbers).  Programs in the polymorphic typed lambda calculus he considers are then interpreted as elements of those sets.  For example, $\lambda\ x : \textit{Nat}.\ x$ will be interpeted as the set-theoretic identity function on the set of natural numbers.  This part of the paper is not too surprising.  The type structure of the language enables a straightforward set-theoretic semantics for typed lambda calculus, at least until it comes to impredicative polymorphism.  Here Reynolds speculates in the paper that the semantics will work out, but discovers the next year that it does not (a problem shortly corrected by others who use a different background mathematical theory which avoids the problems raised by impredicativity).  But the paper actually starts out just with a simply typed language, where things work out cleanly, and then adds on polymorphism (where things do not).

The second semantics for types interprets them as binary relations on the sets given by the first semantics.  The relations are the ones familiar from the setting of logical relations.  For example, the relation associated with $\textit{Nat} \to \textit{Nat}$ relates two unary operations on the set of natural numbers iff those functions map related inputs to related outputs (related by the relation for Nat, which Reynolds will take to be the identity relation).  So they should map equal numeric inputs to equal numeric outputs.

The part of the paper that was driving me crazy is the Identity Extension Lemma.  This says that if you have a type and you interpret all its free type variables as identity relations (on the sets that the first interpretation assigns to those type variables), then the interpretation of that type will also be the identity relation.  So identity relations are extended by the semantics from type variables to all types.

Now, the vexing question to me was, why are we setting up all this machinery to define a bunch of identity relations?  There is one formalistic answer, which is that when it comes to polymorphism, Reynolds achieves Identity Extension by the expedient (not to say hack) of making the first interpretation of types invoke the second one (i.e., the relational one), to say that the set interpretation of a polymorphic type is the set of set-theoretic functions which satisfy Identity Extension.  Voila, Identity Extension holds now for polymorphic types!

But now the real answer — and I wish Reynolds had given more hints about this, or if he did, that I noticed them better: the point of the paper is not the relational semantics, though this is technically the most interesting part.  Really, we are just trying to define the set-theoretic semantics for polymorphic types, in such a way that all the functions we intend to be extensionally equivalent are soundly mapped down to the exact same mathematical object.  So we want to interpret mergesort and quicksort as the unique set-theoretic list-sorting function.  This is why Identity Extension is so important: it shows we did this correctly!  All the interesting equivalence has gotten squashed down in that first semantics.  Thus, the second semantics doesn’t have anything interesting to add.  But it is there as a specification of what we were trying to achieve.  For it is the description of the extensional equality we are trying to mash down with the first semantics.

The fact that the relational semantics is required by the set interpretation of polymorphism is, to me, a total cop-out.  It means that we only achieved the squashing down we wanted by including that extensional equality in our set interpretations.  So for the difficult part of the matter (i.e., polymorphism), it is as if we just said that the set interpretation of type A is the set of equivalence classes of terms, where the equivalence in question is the one given by the relational semantics.  If our goal was to achieve a set interpretation that satisfied that extensional equivalence (i.e., Identity Extension) but was independent of it, then at this point we failed.

Now, this is just the seminal paper in a pretty technically involved line of research.  It looks, though, in more modern treatments, as if the basic set-up is accepted: two semantics, mutually dependent.  That is clearly flawed, if the goal is as I just described (set interpretations which are independent of but satisfy the relational interpretations).  But perhaps that is not the goal of these more recent works?

If you think I am mistaken in any of the above, please perform your charitable deed of the day and correct me!  I am not an expert, just someone trying to understand…

As a service to other students of this paper, I am providing a short legend (reynolds-key) of notation for the first bit of it.  I will probably update this after our next reading-group meeting, when another member of the group is planning to add to it.