Last Monday (June 10th), Geoff Sutcliffe, Cesare Tinelli, and I conducted a workshop on our StarExec project. This is a large NSF-funded project to build a web service to support the infrastructure needs of logic-solving communities. There are a lot of such communities out there: SAT, SMT, TPTP, QBF, Termination, Confluence (CoCo), ASP, HMC, Nonclassical, and quite a few more. These different communities have developed around particular fragments of logic, with particular algorithmic problems to solve. Because Computational Logic is rather a dynamic field these days, largely due to applications in software analysis and verification, there are new subcommunities springing up pretty frequently. To get your community going, the current best practice says you should:
- agree on a format for problems (“benchmarks”)
- start to build a benchmark library which can be used for comparing solvers
- run some kind of competition or, less aggressively, an “evaluation”
- establish some kind of meeting — maybe a workshop at a bigger conference — to provide a locus for (1)-(3).
- possibly set up a solver execution service like SystemOnTpTp or termcomp (or the now retired SMT-EXEC), to let users run their own workloads.
There is interesting community-specific work to be done for all these, of course. But some parts of this process are rather generic, and could in principle be done once and for all, for all communities. That is the goal of StarExec. We have a small compute cluster set up right now, with 32 compute nodes, and a web interface (and a browser-free command interface ) to interact with it. We have substantial funds, to buy probably around 150 more compute nodes. This month we have our first public events: SMT-EVAL 2013, which is an evaluation of SMT solvers; and CoCo 2013, which is a competition among confluence-checking tools for term-rewriting systems.
We are gradually making the system more available to the public. There are still some significant features to be added this summer, and there are bugs we have not tracked down yet. But if you are interested in checking out the system, you can log in as a guest here (though you cannot run jobs that way), or email me. By the end of the summer we will hopefully be ready to advertise more broadly and let more people on the system — although with only 32 nodes, it is not going to be too exciting. We are hoping to have the new hardware purchased and installed by the end of the calendar year.
The purpose of this post is to summarize a bit of the discussion at the workshop, for the benefit of those interested in StarExec (including our advisory board). We got a lot great feedback and discussion from the invited participants of the StarExec 2013 workshop (listed on the workshop web page). I am just going to summarize quickly some points I noted from the participants:
- Daniel Le Berre (SAT): perhaps benchmark preprocessors (which can be invoked whenever a benchmark is uploaded) should be allowed to rewrite the benchmark, for example to normalize it. This could also be useful for lightly scrambling benchmarks as is sometimes done, but then one would probably want the ability to preprocess benchmarks again, after they have been uploaded. This would be a useful feature anyway, in my opinion. Geoff pointed out this would be good for postprocessing job results, too (a community-wide script can be run on the solver output, to generate attributes in the database for that job pair).
- Harald Zankl (Termination, Confluence): the experience of Termination (of rewriting systems) was that buying exotic hardware — e.g., with large numbers of cores or huge amounts of memory — is bad, because solver implementors are incentivized to tune to that hardware. Harald said that in a couple years, no one could wrong a termination checker on ordinary hardware like a laptop. This seems like a strong argument against buying a small amount of exotic hardware.
- Stefan Schulz (TPTP): UPS units are more trouble than they are worth, because they fail often. I find this somewhat controversial, because we have so far had pretty good luck, and it was my understanding that including such units in racks for clusters was a best practice (in environments where power is not reliable).
- Thomas Krennwallner (ASP, FLoC Olympic Games): we need to use numactl to assign the “memory node” affinity of processes, not just the core affinity. Apparently, cores can access memory banks associated with other cores, and if misconfigured, this can impact performance. This was news to almost all of us there, I think, and something we have to look into. Thomas also told us about “linux control groups“, which allow one to control resource usage of groups of processes. This is great, because we are currently using a tool called runsolver for this. Runsolver seems to work quite well, but does not handle some resource issues, like this memory-node affinity business, and, I believe, max disk usage. If there is an OS-level solution, that is really great, so we will explore this, too.
- Stefan: we should reserve a node or two for running a single short test job whenever a solver is uploaded. That way, you do not have to sit and wait your turn to run your solver on the compute nodes, only to find that there is configuration problem or platform issue and your solver won’t run. It would be very helpful for solver uploaders to get a quick confirmation that the solver is able to run on StarExec.
- Daniel: community leaders should be given the option to copy default spaces into a new user’s space. This is so that as a new user, when you go to your space within a community, you immediately could have at least a sample of benchmarks and a couple solvers there, so you could start to use the system quickly. In our current setup, a new user will need to copy some benchmarks and solvers from somewhere (or upload some) into his/her own space, in order to run a job. This seems like a good idea, and well in the spirit of community-level parameterization that we are trying to support in the system.
- Morgan Deters (SMT): showing some kind of activity log per user or community would help import some ideas of social networking into the site.
- Geoff: we should support downloading XML for jobs. Right now, you can download XML for spaces, and that XML will describe the space hierarchy, with its benchmarks and solvers. We should be able to download and upload XML for jobs, so that you can tweak jobs by removing job pairs, or other modifications. I am a little convinced by this one, because I have been using the system successfully (as a user) where I just create a space hierarchy (possibly by uploading space XML) to represent a workload that needs to be run. To run it, I just a create a job in that space. I can rerun easily by creating a new job there. So I am not sure why we need to have a separate mechanism for using XML to create a job.
- Stefan: we should support the ability to use all cores on the nodes, in case exact timing does not matter too much, and we just want to ram a large workload through the system. Indeed, Nikolaj Bjorner (SMT) was telling us that on their internal cluster at Microsoft Research, they can configure jobs to run with 1 job pair per core, per socket, or per node. This is attractive, although I am not looking forward to fitting this in to the dreadful Oracle GridEngine job management system that we are currently using (and Daniel suggested Torque — Thomas suggested Condor — as alternatives).
- Christoph Benzmüller (HOL, Nonclassical): it would be nice if we could give at least a worst-case upper-bound estimate of how long it would take a job to make it through the cluster, given existing competing jobs. This seems like a good idea, and certainly a current worst-case bound should be easy to compute — but it will change as new workloads come in or out of the system. So it may not be too useful after all.
- We also discussed the issue of querying, both benchmarks and job results. You might want to filter rows for job pairs based on things like did all solvers error out on that benchmark (possibly indicating a problem with the benchmark itself), or did only one solver solve a particular benchmark.
- We also had a stimulating discussion about what makes a solver community, initiated by Jens Otten (Nonclassical). It is a largely social phenomenon, of course, and we have to set some kind of informal threshold, to prevent proliferation of 1-person communities on the system. I think a rough rule of thumb is if there is some kind of organized meeting like a workshop that is proposing the community for StarExec, that is sufficient.
There’s more I’m sure I missed noting here, but this captures some of our discussion. I have quite a few emails from people after the workshop, so maybe I will update the post as I reply to those.