## Category Archives: Rewriting

RTA 2011 was held in Novi Sad, Serbia, the last week of May, as part of the RDP 2011 federated conference, which brings together RTA and TLCA every two years.   I was not able to attend any of TLCA ’11, though the program looked quite exciting.  RTA was really great, though.  Silvia Ghilezan, who served as the conference chair for both conferences, did an outstanding job in ensuring that the conference logistics ran smoothly and pleasantly, including excursions in the local area.  For me, this was quite interesting, since my family on my wife’s side is Bulgarian, and there are a lot of cultural and linguistic similarities between the two countries (although no doubt people with longer experience would tell me there are many differences, too).  I was able, occasionally, to get somewhere using my baby Bulgarian, although the shuttle driver from the airport warned me that the languages only share about 30% of their words.

Anyhow, back to RTA.  The technical program was quite fascinating for me.  The invited talk by Sophie Tison surveyed results at the intersection of tree-automata theory and term rewriting, including so-called “preservation of recognizability”.  The idea is to study what classes of term-rewriting systems preserve recognizability of a set of terms by a tree-automaton.  That is, if a set T of terms is recognizable by a tree automaton, under what conditions (on the term-rewriting system) is the set of descendants (or alternatively, ancestors) of terms in T is recognizable by a tree automaton.  There is apparently a line of literature on this problem, which Sophie Tison surveyed in her talk, along with several other interesting variations on tree automata (particularly ones with equality and disequality constraints).  Tree automata are studied in the nice book, available online, Tree Automata Techniques and Applications (“TATA”), of which Sophie Tison is a co-author. This would make a nice topic for a CLC seminar…

I really enjoyed many talks in the program, and this post will go on forever if I transcribe my 10 pages of notes.  Let me just hit a couple exciting highlights.  Fer-Jan de Vries gave a very interesting talk related to confluence of infinitary lambda calculus.  The best thing for me about this talk was that it included a reminder of the basic fact (unknown to me before this) that infinitary lambda calculus is not convergent, under at least some notion of infinitary reduction (there are several, and they involve defining a notion of limit of an infinite reduction sequence).  A counterexample is the term $\Omega^I$, defined as follows:

• $I = \lambda x . x$
• $\Omega^I = I\ ((\lambda x. I\ (x\ x))\ (\lambda x. I\ (x\ x)))$
• $\Omega = ((\lambda x. x\ x)\ (\lambda x. x\ x))$

We have $\Omega^I$ reducing (finitarily, and also in the limit, apparently) to $\Omega$.  But we also have

$\Omega^I\ \rightarrow\ I\ \Omega^I\ \rightarrow\ I\ (I\ \Omega^I)\ \rightarrow \cdots$

The limit of this sequence is an infinitary term $I\ (I\ (I\ \cdots))$.  And that term is not joinable with $\Omega$.  Pretty weird!  Just another example of how infinity tends to defy our finitary intuitions.

There was an amazing tool talk about Anagopos, a visualization tool for lambda calculus and term rewriting reductions.  This tool can draw pretty mind-blowingly cool graphs representing reductions of lambda-calculus terms, and related things.  The two Danish (I believe) Master’s students who presented this did a fine job, both with the tool and the talk.

Ok, I guess I am transcribing my notes a bit.  There was a lot of very interesting material.  René Thiemann had some very interesting things to say about the Termination Competition.  One example: in every competition since 2004, including the 2011 edition, there have been tools producing wrong answers.   Now, the good news is that the Termination Competition has a certified division, where tools produce proofs which are then checked by certifiers based on proof assistants like Coq.  This means they get a rather high degree of confidence that results are correct, at least for this division.  René noted later that the CETA certifier has a fully verified parser — this is an interesting point, since I have read not that long ago that tools like the Compcert verified C (subset) compiler lack this, and it is actually a source of bugs for Compcert (where the rest of the compiler, which has been proven sound in Coq, is bug free).  This was discussed in the fantastic paper, which I saw referenced on Lambda the Ultimate, by John Regehr et al. entitled “Finding and Understanding Bugs in C Compilers”.  Since I saw this through LtU, I did not want to post about it independently, but man that is a great paper.  Everyone interested in compilers and verification should definitely read it.

A few quick other things, as I need to wrap this up.  I had many great conversations with very interesting people, including Georg Moser, a proof-theorist who is working now in automated complexity analysis for term-rewriting systems.  He had a very impressive paper showing that the dependency-pair framework for showing termination of term-rewriting systems gives essentially the mutliply recursive functions, which are (apparently) those derivable using only primitive recursions at type of order 1 (I always get confused about whether orders of types start at 0 or 1 — this is the type $\textit{nat}\rightarrow\textit{nat}$).  I also had a very nice conversation with Nao Hirokawa, whose clever ideas and excellent research agenda very much impressed me.  Nao is also a co-organizer of the recently announced Confluence Competition (CoCo), which will run for the first time in 2012.  I am talking with Nao about his possibly visiting me at The University of Iowa this fall sometime, which would be fun and really great for our group.  Ashish Tiwari gave a great invited talk about applications of term-rewriting to systems biology, and other topics.  This was also a really impressive talk, as Ashish summarized a bunch of very strong work with interesting applications.  Stephanie Weirich, with whom I collaborate, along with Tim Sheard, on the Trellys project, gave the sole shared invited talk between RTA and TLCA — quite an honor.  She gave a great talk, which sparked a lot of interesting conversations with people like Andreas Abel, Thorsten Altenkirch, and Peter Dybjer, all of whom I got to speak with later that day.

In the end, I emerged convinced that RTA is a gem of a conference, with a pleasant and very serious core community, which is also quite welcoming to relative newcomers (such as myself).  The acceptance rate is higher than they’d like (around 55% for this edition), which is not necessarily a bad thing if you are an aspiring author.  The next edition, RTA ’12, will be in Nagoya, Japan.  I also heard from several people that they were interested in possibly coming to the US for a later edition.  I am seriously considering proposing Iowa City, Iowa, for the next available RTA, which looks to be RTA 2015 (as 2013 is in Eindhoven, and 2014 will be with FLoC).  That would be an RDP year, so if I did propose this, and the proposal were accepted, that would bring RTA and TLCA to The University of Iowa, which would be super cool for us, and probably also for many researchers interested in types (for TLCA) and rewriting in North America.

I just want to post a link to the slides for my RTA ’11 talk on our paper “Type Preservation as a Confluence Problem”, co-authored with Garrin Kimmell and Ruba El Haj Omar.  This paper received the best paper award for RTA ’11, which was really exciting for us, of course, and led to a number of interesting conversations at the conference.  I’m planning to post more on the conference itself, which was fantastic, in the next few days.

For a long time, I have been puzzled by the connection between term rewriting and parsing.  My intuition is that it should be possible to view parsing as the process of rewriting an input string to its syntax tree.  Why would one wish to do adopt this viewpoint?  Well, since many computer scientists view parsing as a solved problem, there has not been so much research activity in parsing in the past two decades.  There has certainly been quite a bit, don’t get me wrong, including a number of exciting developments (I am not an expert, but I hear about these things, like GLR parsers, scannerless parsers, parsing by derivatives, etc.).  In contrast, term-rewriting is a more general setting, with lots of interesting provably unsolvable problems.  And nothing gets a theoretical computer scientist fired up liking solving the unsolvable.  Term-rewriting researchers solve 8 unsolvable problems before breakfast, on a slow day, and the research community, while not huge, is very active and has achieved a number of impressive results (as I have commented in this blog before).  So I have always wondered if we could succeed in applying the power of modern term-rewriting theory to parsing, whether we might end up with more usable, flexible, and powerful tools, instead of the unpleasant and brittle ones we limp along with today.  I say these harsh things about mainstream parser generators, because in a sense, they have no interface: debugging a grammar written for an LR(1) parser generator, say, requires intimate understanding of the LR(1) parsing algorithm.  At least, this is true for tools like Bison or ocamlyacc.

The similarity between parsing and term rewriting (or string rewriting) is so obvious that one would think the connection has been fully explored, particularly since both fields are very mature.  Perhaps it has, but statements like the following, from the 1993 monograph “String-Rewriting Systems” by Ronald Book and Friedrich Otto, makes me think my ignorance on this point might not be due just to lack of reading the right literature yet:

One might think that results about string-rewriting systems and techniques developed for studying string-rewriting systems could be of use in studying context-free grammars and languages.  However, there are few examples of this.  Indeed, it would be of interest to determine precisely how the string-rewriting theory can be applied to formal language theory in general (including L-systems).

The obvious basic starting idea on this is just to interpret productions of a context-free grammar (CFG) as string-rewriting rules, by reversing the arrows.   So take a sample (ambiguous) grammar like this one:

1. expr -> NUM
2. expr -> expr + expr

Reversing the arrows gives a string-rewriting system:

1. NUM -> expr
2. expr + expr -> expr

If we want to rewrite strings to their syntax trees, then we will want to have term-rewriting rules, rather than string rewriting rules.  My understanding is that a standard encoding of string rewriting as term rewriting is to view every letter as a unary function symbol.  So we would have

1. NUM(r) -> expr(r)
2. expr(+(expr(r))) -> expr(r)

Here, the variable r would match the rest of the string (encoded as a term).  These rules don’t yet build syntax trees.  It seems a little nicer to me, for that purpose, to use a different encoding of string rewriting as term rewriting, where letters are constants (0-ary function symbols), and there is a right-associative binary operator “.” for connecting letters into strings.  By the way, on either encoding, we can use a constant symbol “\$” for the end of the string:

1. NUM . r -> expr . r
2. expr . + . expr . r -> expr . r

Now to build syntax trees, we just need to make non-terminals unary function symbols, whose argument is the parse tree.  We can assign each rule a name, and use that to label the nodes of the syntax tree associated with an application of the rule (here, I will assume that NUM has no associated data, although in reality we might use a scannerless approach where even terminal symbols are unary, holding the sequence of characters matched from the string).  I am calling the first rule Num and the second Plus.

1. NUM . r -> expr (Num). r
2. expr(t1) . + . expr(t2) . r -> expr(Plus(t1,t2)) . r

Now we start to see a benefit of this approach: the resulting term rewriting system can be checked for confluence, and we will discover the critical pair expr(Plus(Plus(t1,t2),t3)) . r  and expr(Plus(t1,Plus(t2,t3)) . r.  The user could simply add a rewrite rule to indicate which associativity is preferred.  We could employ termination checkers and (local) confluence checkers as needed, to ensure the term rewriting system was guaranteed to yield unique parse trees for every string.  Strings not in the language of the original grammar would have parse trees that are malformed, in the sense that they still contain the “.” operator: they are partly parse trees, partly unparsed strings.  That might conceivably make error reporting easier.

But finally, we come to the puzzle I wanted to share with you about all this.  Ambiguity of CFGs — the existence of multiple distinct parse trees for a single string — is well-known to be undecidable.  The proof in “Introduction to Automata Theory, Languages, and Computation” by Hopcroft and Ullman is by reduction from Post’s Correspondence Problem, and is quite slick.  But with the translation I am suggesting, we would end up with terminating term rewriting systems (assuming we do not add any rules as I suggested for associativity above).  It would seem that the analogous property to ambiguity would be confluence of the resulting rewrite system (the one that builds parse trees).  But confluence of terminating term-rewriting systems is decidable: one just checks for local confluence by testing joinability of critical pairs (like the example I gave above) resulting from non-trivial overlapping of left-hand sides of rules.  And joinability for a terminating system is decidable.  So we seem to have a mismatch: ambiguity of CFGs is undecidable, but confluence of the translations of the CFGs is decidable.  What is going on?

.

.

.

.

[You might want to ponder this for a moment — unless you already see the answer.  It took me a day to get this.  My answer is below. ]

.

.

.

.

There are several things to say about the relation between ambiguity of CFGs and confluence of their translations to term-rewriting systems.  But basically, the fuzzily proposed connection is not precise enough.  I conjecture that ambiguity of CFGs corresponds exactly to ground confluence towards a target (the start symbol) for the resulting term-rewriting system.  By ground confluence, I mean confluence for variable-free terms only.  And by “confluence towards a target”, I mean confluence for terms which rewrite to a certain constant symbol (or here, an application of the start symbol to a parse tree), and only such terms.  Ground confluence towards a target is quite a different problem than just confluence, and that is why there is not a paradox here.  In practice, it seems this means that using a confluence checker would be overly conservative: it might label a term-rewriting system as ambiguous (non-confluent) which is actually non-ambigous (ground confluent).  Of course, confluence implies ground confluence, so there would be no false reports of unambiguity, only ambiguity.  I am interested in exploring further whether an approach like this — and I have not talked about the issue of rewriting towards a target here, which also must be addressed — could be used to create a more usable parser generator based on term-rewriting.