Well, I am embarrassed at how late I am in posting the solution to the puzzle I mentioned in my last post.  It has been a busy summer with taking care of our sweet new baby at home, and running StarExec development at work.  Anyhow, below is a graph with the minimum number of nodes which contains every legal possible combination of properties from termination (aka strong normalization), normalization (aka weak normalization), confluence (aka Church-Rosser), and local confluence (aka Weak Church-Rosser), and their negations.  This graph was found by Hans Zantema, whom I asked about this puzzle by email (he agreed to let me share his solution here).  Furthermore, he argues that 11 is the minimal number of nodes, as follows.  Out of the 16 possible combinations of properties SN, WN, CR, and WCR and their negations, we exclude immediately the combinations with SN and ~WN (since SN implies WN) and CR and ~WCR (since CR implies WCR).  So there are three legal possibilities for the values of CR and WCR, and three for the values of SN and WN.  These are independent, so there are 9 legal combinations of properties.  Now, Hans argues, since there is a node X which is SN and ~WCR, there must be two nodes which are SN and CR.  For since X is SN but not WCR, it has two children (which are still SN) which cannot be joined.  We may assume these children are themselves CR, otherwise we could repeat this observation and the graph would not be minimal.  Similarly, since there is a node which is ~WN and ~WCR, there must be two nodes which are ~WN and CR.  So there must be at least 11 nodes.  And the graph below has 11 nodes.  To test your knowledge, you can try to identify which combination of properties each node has!  Fun!

Suppose we have a graph (A,->) consisting of a set of objects A and a binary relation -> on A.  This is a simple case of an abstract reduction system, as defined in the Terese book (in the more general case, we have not just one relation ->, but an indexed set of relations).  In the theory of abstract reduction systems, an element x is confluent iff whenever there is a path from x to y and a path from x to z, then there exists some element q which is reachable from both y and z.  An element x is locally confluent iff whenever there is an edge (not an arbitrary path) from x to y and an edge from x to z, then there is some element q reachable from both y and z.  So confluence implies local confluence, but (rather famously) the reverse implication holds only for terminating systems.  An element is terminating iff there are no infinite paths from that element.  An element is normalizing iff there exists a path from that element to a normal form, which is an element that has no outgoing edges.  So terminating implies normalizing.

We have these four properties: confluence, local confluence, termination (sometimes also called strong normalization), and normalization (sometimes called weak normalization).  What is the smallest graph that is property diverse, in the sense that for every consistent combination of properties, the graph contains an element with that combination of properties?  (The consistency requirement for the set of properties for an element arises because confluence implies local confluence and termination implies normalization).

I will post the answer to this (with a nice picture) Monday…

It has been a very long time since I posted here — life has been busy, including a new baby at home.  But I really want to share about my recent experience tackling several performance problems in Agda.  Agda, as I hope you know already, is a very elegant dependently typed pure functional programming language.  It supports Unicode, so you can write → instead of -> for function space; and many other cool symbols.  It has user-defined mixfix notation, so you can define if_then_else_ (you write underscores in Agda to show where the arguments go) with the expected syntax.  It compiles to Haskell, although I get the impression that many people just use Agda as an advanced type checker, and do not bother compiling executables.  Agda has very good inference for implicit arguments, which can help make code shorter and more readable.  It also features definition of terminating functions by recursive equations.  So you can write beautiful definitions like the following for infix vector append:

_++𝕍_ : ∀ {ℓ} {A : Set ℓ}{n m : ℕ} → 𝕍 A n → 𝕍 A m → 𝕍 A (n + m)
[] ++𝕍 ys = ys
(x :: xs) ++𝕍 ys = x :: (xs ++𝕍 ys)

You might object to naming the function _++𝕍_ instead of _++_, but Agda does not support type classes or other approaches to operator overloading, and I prefer never to have to worry about symbols clashing from different included files.  This definition is from my standard library (not the Agda standard library; if prompted for username and password, enter “guest” for both).  There is a very nice emacs mode, too.

With all these great features, working well together, Agda provides just about the most elegant programming experience I have had in 21 years of coding.  I think it is a fantastic language, with much to emulate and learn from.  These accolades aside, my purpose in this post is to discuss some grotesque workarounds for performance problems inherent in the implementation and maybe even the language.  To be direct: Agda’s type checker has abysmal performance.  Suppose we create an Agda source file defining test to be a list containing 3000 copies of boolean true.  Agda takes 12.5 seconds to type check this file on my laptop.  If we give the same example to OCaml, it takes 0.2 seconds, in other words, heading towards two orders of magnitude slower.  Now, is Agda’s type checker doing fancier things than OCaml’s?  Undoubtedly.  But not on this example!  I am willing to accept some overhead in general for fancier type checking even on code that does not use fancy types.  And OCaml has been around for quite some time and is engineered by one of the best language implementors on the planet.  Fine.  So let Agda be 2 times slower.  Let it be 5 times slower.  But 60 times slower?  That is not good.

I ran into the performance issues with Agda’s type checker while tilting at the windmill I’ve been dueling off and on the past three years: parsing by rewriting.  Without going into detail here, let me just say I’ve been working, with very patient colleague Nao Hirokawa, to create a new approach to parsing based on term rewriting.  The idea as it stands now is that one runs an automaton approximating the language of your CFG over the input string, to create a formal artifact called a run.  Then one applies confluent run-rewriting rules to this run, where those rules are derived from the productions of the grammar and will rewrite every string in the language to a term built from the start symbol and then containing the parse tree as a subterm.  I love the approach, because it is inherently parallelizable (because the run-rewriting rules are confluent), and because we can resolve ambiguities in the grammar by adding rewrite rules.  The trickiest part is coming up with the right approximating automaton, and this is still not at all ready for prime time (despite the fact that I inflicted it on my undergraduate class this semester).

Anyhow, since I have been teaching undergrad PL this semester using Agda (more on this in a later post), and since Agda does not have a parser generator (perhaps because FPers seem to prefer parser combinators), I decided I would make my parsing-by-rewriting tool, called gratr, target Agda as a backend.  After a fair bit of hacking I had this working, only to discover that for even pretty small grammars, I was generating several thousand line Agda source files, which Agda absolutely could not handle.  Imagine my disappointment!  Beautiful (if not yet ready for big grammars) approach to parsing, targeting Agda, and the Agda type checker could not handle my generated parsers’ source files, except for the tiniest of grammars.  I was depressed, and determined to find a way around this problem to get working parsers from the medium-small grammars I wanted to use for class, as well as research.

Enter –use-string-encoding.  A little experimentation revealed that while Agda chokes checking even very simple terms when they get even moderately big, it will affirm that double-quoted strings indeed have type string in time seemingly independent of string size.  Oh the fiendery.  Let us encode all our parsing data structures — that means automaton and run-rewriting rules both — as strings, get them past the Agda type checker, and then decode at runtime to get our parser.  It is gross, it is nasty, but it might just work.  Of course, no one wants to decode their parsing data structures every time a string is parsed, but that was the price I decided I’d be willing to pay to get my parsers running in Agda.

I spent a month or so — while learning to care, with my wife, for a very fussy newborn — implementing this.  I finally had code in gratr to dump all the data structures as strings, and code in Agda to decode those strings and plug them into my existing parsing-by-rewriting infrastructure.  Agda could type check the file containing the strings in a second or two, even when the strings were huge (megabytes-long files, due to the unfortunately large automata my approach is currently producing).  The moment of truth arrives: let us actually compile the entire shebang to an executable (not just type check that one file).  Agda type-checking chokes.  I cannot believe what I am seeing.  What is happening?  I can type check the files containing the string-encoded data structures almost instantly, but type-checking the wrapper file defining the main entry point (which is just based off the way Haskell sets up code for compilation to executables) is running seemingly forever.  A little quick experimentation reveals: big strings encoding the data structures makes type checking that file super slow.  What gives!  Further head scratching leads me to suspect that for some reason, when Agda is instantiating a module somewhere in my setup, it is actually trying to normalize the fields of a record, where those fields are calling the decode functions on the string encodings.  This is the step that could take a second or two at runtime, with ghc-optimized executables, but will likely take forever with Agda’s compile-time interpreter.  How to make Agda stop doing this?

Here’s a hack for this:

 postulate
runtime-identity : ∀{A : Set} → A → A
{-# COMPILED runtime-identity (\ _ x -> x ) #-}

The idea is that we will interpose the postulate called runtime-identity to block the redex of decoder applied to string encoding.  At compile time, Agda knows nothing about runtime-identity, and hence will not be able to reduce that redex.  But at runtime, runtime-identity will be compiled to the identity function in Haskell, and hence the redex will reduce.

Delightfully, this worked.  Compiling the emitted parsers with the string encodings is now very quick, maybe 10 seconds to get all the way through Agda to Haskell to an executable.  Awesome!  Now let’s just run the executable.  Heh heh.  There is no way that could not work, right?

Wrong, of course.  Running the executable on a tiny input string takes 7 minutes and then I kill it.  Oh my gosh, I am thinking.  I just spent 5 weeks of precious coding time (in and around other duties, especially new childcare duties, with my patient wife looking on and wondering…) to get this gross hack working, and now the runtime performance is unacceptable.  I almost despair.

But hark! Reason calls.  It cannot be taking a ghc-optimized executable that long to decode my string-encoded data structures.  After all, encoding the data structures to strings from my OCaml gratr implementation is instantaneous.  Sure, decoding could be a bit longer, but forever longer?  That can’t be right.  So how can we figure out what is going on?

Easy: profile the compiled code with ghc’s profiling features.  Agda just compiles down to (almost unreadable) Haskell, which is then compiled by ghc, so we can just profile the code with ghc.  I had never used ghc’s profiler, but it was very simple to invoke and the results were easily understandable.  Where is the time going?  Here is the scary line:

d168 MAlonzo.Code.QnatZ45Zthms 322 7928472 10.9 36.0 55.0 53.5

The last numbers are showing that over half the time of the executable is going into function d168 in nat-thms.agda.  A function in nat-thms.agda?  That contains a bunch of lemmas and theorems about natural-number operations.  I hardly expect my parser to be grunting away there.  What is d168?  Well, it is the Agda-emitted version of this lemma:

<-drop : ∀ {x y : ℕ} → (x < (suc y) ≡ tt) → x ≡ y ∨ x < y ≡ tt

This function looks to take linear time in the size of x, which could be the length of the emitted string encoding in this case.  Where on earth is this called from?  And why is its evaluation getting forced anyway in Haskell’s lazy evaluation model?  <-drop is getting called in

wf-< : ∀ (n : ℕ) → WfStructBool _<_ n

This is the proof that the _<_ ordering on natural numbers is well-founded.  The string-decoding functions have to use well-founded recursion for Agda to see they are terminating.  You recursively decode some part of the string, and then need to continue on the residual part of the string that has not been decoded yet, which is returned by your recursive call.  Agda cannot see that the residual you are recursing on is a subterm of the starting input string, so it cannot confirm the function is structurally terminating.  The solution is to use well-founded recursion.  And this is taking, as far as I can tell, time quadratic in the size of the input string to be decoded.  These strings are long, so a quadratic time operation (with lots of recursion and pattern matching) is going to kill us.

What is the solution?  Strip out the well-founded recursion and just disable Agda’s termination checker.  I do this, cross my fingers, compile, run, and … it works!  Hot diggety.

So those are the three performance problems we tackled here in Agda: slow type checking (just avoid the type checker altogether by encoding big data structures as strings and decoding at runtime), unwanted compile-time evaluation (interpose postulated runtime-identity to block the redexes), and super slow well-founded recursion (punt and disable the termination checker).  I am interested in any similar experiences readers may have had….

Our shipment from HP of 160 nodes for StarExec arrived this week.  Pictures below courtesy of Aran Cox:

Everything needs to be installed and configured, so it will be some time yet (4-6 weeks) until this is incorporated into StarExec.  At that point, we will finally throw open the doors to the public…

I recently had occasion to read two papers by philosopher and logician Neil Tennant.  These were easily the most thought-provoking papers on philosophy of logic that I have read in the past two years.  One I found absolutely compelling, the other I regret to say I found to be completely and spectacularly wrong.  The writing in both was elegant and often witty, a charming change from the more workman-like efforts of us less cultured Computer Scientists.  The papers are:

I have not read other things by Tennant, having only just discovered that I likely would enjoy doing so.  He seems to be a prolific writer, so I am in luck.  I chose these by title to take with on a short trip, and am very glad I did so.

The title of the first of these is really inspiring — more so even the contents.  Despite my college education in Philosophy, I never studied Kant, nor too much of the analytic-synthetic distinction.  As Tennant summarizes it, the question is whether the truth or falsity of a sentence is determined solely by its (intrinsic) meaning, with no further contribution from the way the external world is, or whether the way the external world is contributes to determining the truth or falsity of the sentence.  In the first case (no contribution from the world) the sentence is called analytic.  In the second (some contribution from the world), it is synthetic.  Now, little as I know about this distinction, it is easy to see from the concepts in question, not to mention the custom of philosophers of exploring issues like this from a huge number of angles with endlessly nuanced arguments and positions, that this is a delicate and potentially complex matter.  But it was certainly an informative simple summary for me.

The main point of “The Law of Excluded Middle is Synthetic A Priori, If Valid” is that strictly classical principles like the law of excluded middle (LEM, $p \vee \neg p$), which Tennant says presume a meta-principle of bivalence — that every sentence is either true or false — are synthetic formulas.  They are a priori if valid, in Tennant’s view, in the sense that they are known absolutely independently (in principle?) of empirical experience.  The point is that there may be some additional metaphysical reasons for believing in bivalence, which would justify acceptance of strictly classical principles.  But there is nothing in the inherent meaning of the logical connectives that make sentences like LEM valid, if one assumes that the meaning of a logical connective is determined by its introduction and elimination rules.  Tennant seems skeptical that there is a real metaphysical justification for bivalence and hence LEM, but he is willing to allow the classical reasoner to persist in his bivalent thinking.  What he is not willing to allow is that such thinking is analytic: classical inferences should be explicitly acknowledged as synthetic.

Tennant’s position on strictly classical principles seems pretty compelling if one adopts a proof-theoretic view of logic, because at least for natural deduction systems (which Tennant seems to favor), classical principles are an add-on to the system, and do not follow the usual pattern of specifying the meaning of a logical connective via introduction and elimination rules. One simply has to postulate LEM as an axiom, or add a principle of proof by contraction (to prove $A$, assume $\neg A$ and derive a contradiction). So those principles do not seem to arise as an intrinsic part of the meaning of the logic connectives. Tennant’s position gives us a way to live with this phenomenon: we must just realize that strictly classical principles do not arise intrinsically from the meanings of the logical connectives because those principles really are extra-logical assumptions. The world may be determinate in the sense of bivalence, and it is at least conceivable that it should fail to be so. If we want to proceed under the assumption that it is, we need to make this assumption explicit in our reasoning.

We can perhaps strengthen this line of reasoning if we think of logic not just as a tool for thinking about the real world, which may or may not be determinate, but as a tool for reasoning within theories, which we know from metamathematics can certainly be incomplete and hence fail to satisfy bivalence. Take the fact that the Continuum Hypothesis (CH) is independent of Zermelo-Frankel set theory with the axiom of choice (ZFC), a celebrated result proved in two halves by Kurt Goedel and Paul Cohen. This shows that ZFC is indeterminate with respect to CH. A more basic example is Goedel’s theorem that first-order arithmetic is incomplete. For reasoning in such theories, perhaps it is more faithful to avoid LEM as part of the logic. If we think of the theories as describing only so much of the imaginable mathematical universe, then equipping them with LEM seems wrong. ZFC does not tell us whether CH is true or CH is false. Why should we postulate that despite this indeterminacy, one or the other of CH and its negation must be true? This seems to suggest a view of mathematical theories as being about all of some pre-existing mathematical reality, where there really is a truth of the matter about CH, whether the theory tells it to us or not. But given the multiple incompatible mathematical universes one can describe with a logical theory — for example, there are consistent extensions of ZFC where CH it true and other (incompatible) extensions where it is false — it has always seemed strange to me to assume there is a single pre-existing mathematical reality that is being described by different theories. In this sense, I consider myself an anti-realist about mathematics.

But there are a couple challenges we could mount to Tennant’s position that LEM, as an appendage to the usual natural deduction proof system for (say) first-order logic, should be regarded as synthetic instead
of analytic. First, we could question the choice of natural deduction as the proof system to use for deciding what is intrinsic to a logical connective and what is not. For example, it is well-known that if we use an unrestricted multi-succedent sequent calculus, we can derive classical principles like Peirce’s law just from the rule for assumptions and the left and right rules for implication. Here is  a derivation:

If we are willing to pass from minimal to intuitionistic logic (with a rule of explosion — also called ‘ex falso quodlibet’, or ‘from false, whatever’ — saying that from $\perp$ we can conclude any formula $A$), we can derive $((A \to \perp) \to \perp) \to A$ with a similar derivation. We have used no additional axioms embodying strictly classical principles. The unrestricted right rule for implication has sneakily included a subtle form of classical reasoning into the proof system: so subtle it is not so easy to see what its source is! One explanation for what is happening is that that rule is not compatible with the Kripke semantics for intuitionistic (or minimal) logic, where positive knowledge must persist in future worlds, but negative knowledge need not. We can view the presence of $A$ on the right as an assumption that $A$ is false, and hence we should not be allowed to use that assumption when we step to a new world where we are assuming $A$ is true. But this depends on a particular semantics for the logic. If we are taking the logical rules as primitive, what stops us from adopting this perspective? Likely there is some other way to find bivalence hidden in the basic form of sequents and the rules acting on them (though I do not know what it is).

The other challenge I see to Tennant’s view is simply to deny that the proof-theoretic approach to the semantics of the logical connectives is the right one. After all, the usual boolean semantics seems pretty
natural! I assume there are sophisticated answers to be made to this, so I will leave it be for now, and turn briefly to the second paper.

“Negation, Absurdity, and Contrariety” is largely concerned with denying that $\perp$ (falsity) has a legitimate role in logical formulas. Tennant asks “Is $\perp$ a genuine propositional constant, eligible to be a constituent of other sentences?” The succinct answer he offers is “On the face of it, no; unless $\perp$ is identified with a particular absurd sentence of an already interpreted language.” He goes on to offer reasonably convincing arguments against accepting $0 = 1$ as a definition of $\perp$ (largely because now the notion of falsity is tied to some particular theory, in this case arithmetic). Now, I think we can make Tennant’s job more difficult here, by choosing a formula which is not tied to a particular theory, to be our definition of $\perp$. For example, why not take the formula $\forall x. \forall y. x = y$? Even if we do not think that equality should be considered a logical connective, it does not seem tied to a particular theory. Certainly many theories besides just arithmetic rely on a notion of equality.  So perhaps this definition of $\perp$ can better resist a charge of being overly theory-dependent. Or if that is not abstract enough, let us just switch to a higher-order logic. There, it is standard to define $\perp$ as $\forall F. F$. This definition immediately gives rise to the ex falso quodlibet rule of inference mentioned above), since if $\forall F. F$ is true, we can conclude any formula $A$ by instantiating $F$ with $A$. This is a very strong notion of falsity, as Tennant is also noting (since it implies that a contradiction in one part of a theory automatically infects all other reasoning one might do).

Frank Fu has proposed a different notion of falsity, that has better properties when used in type theory. We can define Leibniz equality $x =_T y$ of terms $x$ and $y$ of type $T$ to be $\forall C:T \to \textit{Prop}. C\ x \to C\ y$. So we are saying that $x$ equals $y$ if all truths about $x$ are also truths about $y$. Now define $\perp$ to be $\forall T:\textit{Type}. \forall x : T. \forall y : T. x =_T y$. Using this definition, Frank has shown we can derive $0 \neq 1$ in pure type theory, where $0$ and $1$ are Church-encoded.

Now all these points aside, I come to my main complaint about this second paper of Tennant’s. He asks whether $\perp$ is a genuine propositional constant that one should be allowed to use in object-language sentences. He devotes significant space to arguing against defining $\perp$. I’d be willing to grant these arguments. But the real question for me is whether or not it makes sense to take $\perp$ as an undefined logical constant. And here, Tennant offers very skimpy arguments. He just says that whenever someone says something like “Contradiction!” or “That’s absurd!”, she is offering a metalinguistic commentary on whether a contradiction can be derived from some set of assumptions. And from this claim, in just a few sentences of consideration, Tennant confidently dismisses the idea of taking $\perp$ as a
propositional constant.

I find this completely inadequate as an argument against $\perp$ as a logical constant. After all, the role of object-language formulas is to reflect certain metalinguistic realities. This is certainly so from Tennant’s constructivist perspective, which emphasizes a tight agreement between the meanings of logical constants and metalinguistic inference rules. I don’t see how this same objection that Tennant is mounting — that something cannot be a logical connective if all it is doing is reflecting metalinguistic commentary — cannot be applied to any formula construct! One could just as well say that when someone says $A_1 \wedge A_2$, all he is really doing is making a metalinguistic statement about derivability of both $A_1$ and $A_2$. So we should not use any object language at all, but speak only in our metalanguage!
(And how is its meaning to be determined?)

A variation on my objection to Tennant: truth and falsity play quite a dual role in many different formulations of logic. How is Tennant’s objection not to be dually applied to rule out the use of $\top$? (Maybe he would urge this — but then I am surprised he takes aim just at $\perp$, and neglects $\top$).

Anyhow, much as I repudiate Tennant’s conclusion in this second paper, I found it as enjoyable and interesting reading as the first, and commend both to you.