Skip navigation

Category Archives: Coding

I will be posting again about lambda-encoding data, but I wanted to share one interesting experience first, applying a fuzzer to our versat verified SAT solver, primarily implemented by Duckki Oe, who is, by the way, currently looking for postdocs or possibly industrial verification positions.  This solver took Duckki, with some contributions from me, Corey Oliver, and Kevin Clancy, 2 years to implement.  Along the way, he did find and fix a couple of bugs in parts of the solver we were not verifying.   For example, at some point there was a bug in watched lists, so new clauses were never indexed. That affected performance behavior, and took a while to detect.  But it was not needed for UNSAT-soundness (that is, correctness of UNSAT answers, which is what we verified).  Also, Duckki debugged some SAT-unsoundness issues, which were (easily) detected by model checking at the end of solving.  So writing verified code does not mean there is no debugging to do.


Certainly for what we verified we have not found a bug in evaluating versat.  Such a bug would represent an error in our specification, which is short (consisting of the definition of propositional resolution and a parser for CNF formulas in DIMACS format); or else a bug in the Guru compiler or logic.  The latter is not out of the question, but unlikely to manifest itself as accepting an invalid proof.   And just today I followed the lead of John Regehr in actually trying to find bugs in a piece of code that has been proven correct.  After all, versat is UNSAT-sound in theory, but maybe not in practice!  John did this for the CompCert verified optimizing C compiler, and after expending 6 compute years on searching for bugs, could not find even one in the verified core of the compiler (but several in the parser and similar unverified parts).  I used the cnfuzz tool by Robert Brummayer, Florian Lonsing, and Armin Biere.  I wrote a simple wrapper script that would call cnfuzz to generate random nasty CNF benchmarks to feed to versat, and also to MiniSat 2.2.0.  Then I just compared their answers.  I ran 10,000 (different) benchmarks this way (way less of an evaluation than John reported for CompCert).  Of these, versat timed out on 27 with a 60 second time limit.  For all the rest, the answers are the same for the two tools.

Let me try to emphasize just how remarkable this is.  A modern SAT solver is not a huge amount of code (typically), but it is highly optimized and very tricky.  I would expect quite a bit of testing before such a tool written in a conventional language is reliable.  For example, way back in the day when I was a grad student working on the CVC SMT solver (first version, with subsequent versions implemented by Clark Barrett and his students and my colleague Cesare Tinelli and his students), I remember many fond hours (days, weeks) spent with pieces of paper taped together covered in handwritten annotated formulas as I tried to find why the solver was giving the wrong answer in some case.  Now, an SMT solver is truly a lot more complicated than a SAT solver, and usually with a lot bigger codebase.   But still, I would fully expect similar pain in getting a modern SAT solver up and running reliably.

For versat, modulo a couple of bug fixes earlier in development for unverified parts of the solver, it works perfectly right out of the gate.  We finally get it through the type checker after 6 months of proving lemmas about its data structures and algorithms, and it is completely correct, at least as testing on 10,000 random benchmarks reveals compared to MiniSat.  Notice, it is not just the UNSAT answers that are right.  The SAT ones are right, too.  I have noticed this in developing real verified software that you actually intend to run (as opposed to just formalize, prove correct, and discard).  Verifying some properties tends to increase overall correctness, even for properties you don’t verify.  The verification is so ridiculously hard and requires such deep involvement in what the tool does, that you just end up with a higher quality piece of code in the end — if you get something working at all, which is a very difficult proposition, requiring highly skilled engineering talent (did I mention Duckki is looking for postdoc and industrial verification positions).

So, in the end, I feel this is great confirmation — and it was a great idea of John Regehr’s to do this in general — of the quality of a verified piece of software.  Without extensive testing and debugging, versat exactly matches MiniSat’s behavior on 10,000 benchmarks (modulo 0.27% timeouts) generated by cnfuzz.  This is what a verified-programming proponent dreams of: you write your code, you verify it with blood sweat and tears, and it just works perfectly at the end.  It is a great feeling, and one for which I am looking for additional interesting case studies to repeat.

Mild neglect of duty continues here in favor of performance evaluation, which I was discussing in my previous post.  Here are a couple more useful things I found:

  • Using fgetc_unlocked() cut my processing time by 25% parsing a 120MB proof.
  • Using a raw C array of characters instead of an STL string to hold the current token cut another 25%.

These two simple changes have got my times down on a 1GB proof to 45s user time, which I am quite happy with.  Watching what is happening using “top”, CPU utilization is around 50% parsing this proof, which I interpret to mean that the process is IO-bound now.  So on my laptop, at any rate, I’m reaching the limit of how quickly I can tear through this big proof on disk.

As I noted in comments to the previous post, pulling in bigger chunks of text from disk using either fread() or mmap() does not improve on fgetc().

Hope everyone is having a pleasant and/or productive weekend.

The end of the semester here at U. Iowa is plenty busy, but (or perhaps consequently?) I have been doing some coding on our OCaml implementation of LFSC (“Logical Framework with Side Conditions”), a descendant of the Edinburgh LF that we are proposing as a standard metalanguage for proofs produced by SMT solvers.  This compiler takes a signature as input — where a signature just defines the syntax and proof rules for your logic, logical theory, or other formal system of interest — and produces C++ code for checking proofs expressed in that logic.  Our previous version of this compiler was written in C++.  It was quite complicated, somewhat difficult to maintain, and hard to use as a basis for many of our optimizations (because we were compiling signatures to C++ code that was just extending the C++ code for the compiler itself).  So we (Jed McClurg, Cuong Thai, and myself) have been implementing a new compiler for LFSC in OCaml this semester (with important input and advice from Cesare Tinelli and Andy Reynolds — see for more about our group).

Anyhow, writing this in OCaml is so much cleaner and nicer, mostly because OCaml is just much better for writing symbolic tools, but partly also because we have learned a bunch from our previous experience.  In particular, we have trimmed down LFSC quite a bit, to focus it on its intended application.  Also, we know better how to implement things efficiently in the emitted C++ proof checker.  For example, take parsing.  The approach we used in the C++ LFSC compiler, and are working to implement in the OCaml version, is called incremental checking.  The C++ proof checker (emitted by the LFSC compiler) parses and checks the proof in an interleaved fashion, as opposed to parsing first to an abstract syntax tree, and then checking.  This approach is about 5x faster in our experience, partly because it uses much less memory.  It can therefore also handle much bigger proofs.

To get to the title of this post: as a warm-up for emitting incremental checking code from the OCaml LFSC compiler, I have been implementing something that just parses proofs, without checking them (or building ASTs for them).  The prefix syntax of LFSC proofs is very simple: the first symbol of any proof uniquely determines what kind of proof it is.  So we could just write a recursive descent parser.  But that would use C’s stack to keep track of what our pending parsing tasks are.  It is much more efficient to use a stack data structure in the language.  And there we come to what I wanted to post about this evening: C++ stacks and their inefficiencies.

But first, let’s make sure the algorithm is reasonably clear.  You are trying to parse prefix expressions, where the operator (which comes first) uniquely determines which subexpressions should be parsed next, in order to parse an expression beginning with that operator.  For a simple example, here are some first-order logic formulas in prefix notation:

formula f ::= true | false | and f1 f2 | imp f1 f2 | or f1 f2 | forall i ^ f.

The caret notation for forall is binding notation (and “i” is an individual, from another syntactic category), which LFSC supports (but which we can ignore for the discussion about C++ stacks).  A nice efficient way to parse formulas like these is to keep a stack of pending work to be done.  Each entry in the stack should hold an operator (or more efficiently, operator tag as an integer), and the number of arguments parsed so far.  Depending on how many arguments have been parsed, one either parses the next argument from the appropriate syntactic category, or else does a “reduce”, which pops the stack and increments the number at the top of the resulting stack (since we have just parsed one more subexpression for the expression that is now indicated by the top of the stack).

So in the end, we need stacks.  C++ has stacks: just use std::stack<int>, say, from the Standard Template Library, for your stack of ints holding the number parsed (and another for the operator tag).  So I implemented compilation from signatures to C++ code to parse proofs (and also syntactic expressions like the formulas listed above), using STL stacks.  The performance was somewhat disappointing: around 89s (wallclock) to parse a 1GB proof (averaged across three runs), on my somewhat oldish Latitude D420 laptop.  I profiled the emitted code, and found that an inordinate amount of time was going into stack operations, which seemed suspicious.  According to gprof on the unoptimized executable — and in my experience gprof’s time estimates are often unreliable, though call counts presumably are fine — something like 60% of the time was going into stack operations!  Admittedly, there are a lot of them: 1.6 billion calls to top(), for example.  But still, that was looking like an obvious optimization target.

STL stacks use another data structure to implement the underlying collection of stack entries, so I tried changing from deque’s (the default), to lists (much worse) and also vectors (slightly worse).  I had had bad experiences in the past with STL hashtables’ being sluggish, so I just quickly hacked my own little naive stack class.  No resizing, just a big array, push(), pop(), and top().  This reduced my time on the big proof to 79s (wallclock) on average, for a 12% speedup.  For user times, the speed up is less, more like 5%.  Here’s the code (very simple stuff — sorry the formatting is not coming through better here):

#ifndef LFSC_STACK_H
#define LFSC_STACK_H

#include <cstdlib>

template<typename Tp>
class Stack {

int max_size;
int last;
Tp *arr;

Stack(int _max_size) {
max_size = _max_size;
arr = new Tp[max_size];
last = -1;

inline void push(Tp t) {
if (last >= max_size) {
// should report error
arr[last] = t;

inline void pop() {
if (last < 0) {
// should report error

inline Tp &top() {
if (last < 0) {
// should report error
return arr[last];

inline int size() {
return last+1;



So I am a little disappointed in the STL data structure, although I acknowledge it is doing more than my simple implementation.  Nevertheless, this is the point of my post here: if you are hacking in C++ and looking for performance, be wary of the STL containers.  In my experience, you can usually beat them, especially if you customize a little for your application.


Get every new post delivered to your Inbox.