The end of the semester here at U. Iowa is plenty busy, but (or perhaps consequently?) I have been doing some coding on our OCaml implementation of LFSC (“Logical Framework with Side Conditions”), a descendant of the Edinburgh LF that we are proposing as a standard metalanguage for proofs produced by SMT solvers. This compiler takes a signature as input — where a signature just defines the syntax and proof rules for your logic, logical theory, or other formal system of interest — and produces C++ code for checking proofs expressed in that logic. Our previous version of this compiler was written in C++. It was quite complicated, somewhat difficult to maintain, and hard to use as a basis for many of our optimizations (because we were compiling signatures to C++ code that was just extending the C++ code for the compiler itself). So we (Jed McClurg, Cuong Thai, and myself) have been implementing a new compiler for LFSC in OCaml this semester (with important input and advice from Cesare Tinelli and Andy Reynolds — see clc.cs.uiowa.edu for more about our group).

Anyhow, writing this in OCaml is so much cleaner and nicer, mostly because OCaml is just much better for writing symbolic tools, but partly also because we have learned a bunch from our previous experience. In particular, we have trimmed down LFSC quite a bit, to focus it on its intended application. Also, we know better how to implement things efficiently in the emitted C++ proof checker. For example, take parsing. The approach we used in the C++ LFSC compiler, and are working to implement in the OCaml version, is called *incremental checking*. The C++ proof checker (emitted by the LFSC compiler) parses and checks the proof in an interleaved fashion, as opposed to parsing first to an abstract syntax tree, and then checking. This approach is about 5x faster in our experience, partly because it uses much less memory. It can therefore also handle much bigger proofs.

To get to the title of this post: as a warm-up for emitting incremental checking code from the OCaml LFSC compiler, I have been implementing something that just parses proofs, without checking them (or building ASTs for them). The prefix syntax of LFSC proofs is very simple: the first symbol of any proof uniquely determines what kind of proof it is. So we could just write a recursive descent parser. But that would use C’s stack to keep track of what our pending parsing tasks are. It is much more efficient to use a stack data structure in the language. And there we come to what I wanted to post about this evening: C++ stacks and their inefficiencies.

But first, let’s make sure the algorithm is reasonably clear. You are trying to parse prefix expressions, where the operator (which comes first) uniquely determines which subexpressions should be parsed next, in order to parse an expression beginning with that operator. For a simple example, here are some first-order logic formulas in prefix notation:

formula f ::= true | false | and f1 f2 | imp f1 f2 | or f1 f2 | forall i ^ f.

The caret notation for forall is binding notation (and “i” is an individual, from another syntactic category), which LFSC supports (but which we can ignore for the discussion about C++ stacks). A nice efficient way to parse formulas like these is to keep a stack of pending work to be done. Each entry in the stack should hold an operator (or more efficiently, operator tag as an integer), and the number of arguments parsed so far. Depending on how many arguments have been parsed, one either parses the next argument from the appropriate syntactic category, or else does a “reduce”, which pops the stack and increments the number at the top of the resulting stack (since we have just parsed one more subexpression for the expression that is now indicated by the top of the stack).

So in the end, we need stacks. C++ has stacks: just use std::stack<int>, say, from the Standard Template Library, for your stack of ints holding the number parsed (and another for the operator tag). So I implemented compilation from signatures to C++ code to parse proofs (and also syntactic expressions like the formulas listed above), using STL stacks. The performance was somewhat disappointing: around 89s (wallclock) to parse a 1GB proof (averaged across three runs), on my somewhat oldish Latitude D420 laptop. I profiled the emitted code, and found that an inordinate amount of time was going into stack operations, which seemed suspicious. According to gprof on the unoptimized executable — and in my experience gprof’s time estimates are often unreliable, though call counts presumably are fine — something like 60% of the time was going into stack operations! Admittedly, there are a lot of them: 1.6 billion calls to top(), for example. But still, that was looking like an obvious optimization target.

STL stacks use another data structure to implement the underlying collection of stack entries, so I tried changing from deque’s (the default), to lists (much worse) and also vectors (slightly worse). I had had bad experiences in the past with STL hashtables’ being sluggish, so I just quickly hacked my own little naive stack class. No resizing, just a big array, push(), pop(), and top(). This reduced my time on the big proof to 79s (wallclock) on average, for a 12% speedup. For user times, the speed up is less, more like 5%. Here’s the code (very simple stuff — sorry the formatting is not coming through better here):

#ifndef LFSC_STACK_H

#define LFSC_STACK_H#include <cstdlib>

template<typename Tp>

class Stack {private:

int max_size;

int last;

Tp *arr;public:

Stack(int _max_size) {

max_size = _max_size;

arr = new Tp[max_size];

last = -1;

}inline void push(Tp t) {

last++;

if (last >= max_size) {

// should report error

exit(1);

}

arr[last] = t;

}inline void pop() {

if (last < 0) {

// should report error

exit(1);

}

last–;

}inline Tp &top() {

if (last < 0) {

// should report error

exit(1);

}

return arr[last];

}inline int size() {

return last+1;

}};

#endif

So I am a little disappointed in the STL data structure, although I acknowledge it is doing more than my simple implementation. Nevertheless, this is the point of my post here: if you are hacking in C++ and looking for performance, be wary of the STL containers. In my experience, you can usually beat them, especially if you customize a little for your application.