blog_post_tests/20100827122508.blog

Risk, Verification, and the INTERCAL Reconstruction Massacree
<p>This is the story of the INTERCAL Reconstruction Massacree, an essay in risk versus skepticism and verification in software development with a nod in the general direction of <a href="http://www.arlo.net/resources/tablature/display.php?file=alices.tab">Arlo Guthrie</a>.</p>
<p>About three hours ago as I began to write, I delivered on a promise to probably my most distinguished customer ever &#8211; Dr. Donald Knuth.  Don (he asked me to call him that, honest!) had requested a bug fix in INTERCAL, which he plans to use as the subject of a chapter in his forthcoming book <cite>Selected Papers on Fun And Games</cite>.  As of those three hours ago Donald Knuth&#8217;s program is part of the INTERCAL compiler&#8217;s regression-test suite. </p>
<p>But I&#8217;m not actually here today to talk about Donald Knuth, I&#8217;m here to talk about risk versus skepticism and verification in software engineering &#8211; in five part harmony and full orchestration, using as a case study my recent experiences in (once again) calling INTERCAL forth from the realm of the restless dead.</p>
<p><span id="more-2491"></span></p>
<p>(Feel free to imagine an acoustic guitar repeating a simple ragtime/blues tune in the background. For atmosphere.)</p>
<p>Those of you coming in late may not be aware that (1) INTERCAL is the longest-running and most convoluted joke in the history of programming language design, and (2) all modern implementations of this twisted, sanity-sucking horror are descended from one that I tossed off as a weekend hack in 1990 here in the town of Malvern Pennsylvania (the manual describing the language goes back to 1972 but before my C-INTERCAL there hadn&#8217;t been a running implementation available in about a decade).  </p>
<p>Since then, the attention I&#8217;ve given C-INTERCAL has been rather sporadic.  Years have gone by without releases, which is less of a dereliction of duty than it might sound like considering that the entire known corpus of INTERCAL code ships with the compiler.  INTERCAL attracts surreality the way most code attracts bitrot; after one longish maintainence hiatus (around the turn of the millennium, whilst I was off doing the Mr. Famous Guy thing on behalf of open source) I  discovered that INTERCAL had nucleated an entire weird little subculture of esoteric-language designers around itself, among whom I had come to be regarded as sort of a patriarch in absentia&#8230;.</p>
<p>Despite my neglect, every once in a while something like that would happen to remind me that I was <em>responsible</em> for this thing.  Donald Knuth provided the most recent such occasion; so I gathered together my editors and debuggers and implements of of destruction and dusted off the code, only to discover that it had been a full seven years since I&#8217;d last done so. (I can see the tagline now: &#8220;INTERCAL has ESR declared legally dead, film at 11.&#8221;)</p>
<p>A week of work later, I was even more nonplussed to discover that others had been doing serious work on the compiler while I wasn&#8217;t looking.  Notably, there was one Alex Smith (aka ais523, hail Eris, all hail Discordia!) a doughty Englishman who&#8217;d been shipping a descendant of my 2003 code since 2006.  With lots of new features, including a much more general optimizer based on a technique that could be described as a compiler compiler compiler. (That&#8217;s &#8220;compiler to the third meta&#8221;, for those of you in the cheap seats.)</p>
<p>I straightaway wrote Alex explaining the challenge from Knuth and suggesting we defork our projects.  He agreed with gratifying enthusiasm, especially when I explained that what I actually wanted to do was reconstruct as much of the history of C-INTERCAL as possible at this late date, and bash it all unto a repo in a modern distributed version control system which he and I could then use to cooperate.  Now, early in this essay I introduced by stealth one  of the topics of my discourse on skepticism and verification, the regression-test suite (remember the regression-test suite?).  This is another one, the DVCS.  We&#8217;ll get back to the DVCS.</p>
<p>Reconstructing the history of C-INTERCAL turned out to be something of an epic in itself. 1990 was back in the Dark Ages as far as version control and release-management practices go; our tools were paleolithic and our procedures likewise.  The earliest versions of C-INTERCAL were so old that even CVS wasn&#8217;t generally available yet (CVS 1.0 didn&#8217;t even ship until six months after C-INTERCAL 0.3, my first public release). SCCS had existed since the early 1980s but was proprietary; the only game in town was RCS. Primitive, file-oriented RCS.</p>
<p>I was a very early adopter of version control; when I wrote Emacs&#8217;s VC mode in 1992 the idea of integrating version control into normal workflow that closely was way out in front of current practice.  Today&#8217;s routine use of such tools wasn&#8217;t even a gleam in anyone&#8217;s eye then, if only because disks were orders of magnitude smaller and there was a lot of implied pressure to actually throw away old versions of stuff.  So I only RCSed some of the files in the project at the time, and didn&#8217;t think much about that.  </p>
<p>As a result, reconstructing C-INTERCAL&#8217;s history turned into about two weeks of work. A good deal of it was painstaking digital archeology, digging into obscure corners of the net for ancient release tarballs Alex and I didn&#8217;t have on hand any more.  I ended up stitching together material from 18 different release tarballs, 11 unreleased snapshot tarballs, one release tarball I could reconstruct, one release tarball mined out of an obsolete Red Hat source RPM, two shar archives, a pax archive, five published patches, two zip files, a darcs archive, and my partial RCS history, and that&#8217;s before we got to the aerial photography.  To perform the surgery needed to integrate  this, I wrote a custom Python program assisted by two shellscripts, topping out at a hair over 1200 lines of code.</p>
<p>You can get a look at the results by cloning from git://gitorious.org/intercal/intercal.git which is the resulting git repo.  Now, friends, you may be wondering why I bothered to do all this rather than simply starting a repo with ais523&#8217;s latest snapshot and munging my week&#8217;s worth of changes into it, and all I&#8217;m going to say about that is that if the answer isn&#8217;t intuitively obvious to you you have missed the point of INTERCAL and are probably not a hacker. A much more relevant question is why I&#8217;m <em>writing</em> about all this and what it has to do with risk versus skepticism and verification in software engineering. That&#8217;s a good question, and the answer is partly that I want you all to be thinking about how software-engineering practice has changed in the last twenty years, and in what <em>direction</em> it&#8217;s changed.  </p>
<p>Software engineering is a huge exercise in attempting to control the risk inherent in writing programs for unforgiving, literal-minded computers with squishy fallible human brains.  The strategies we&#8217;re evolved to deal with this have three major themes:  (1) defensive chunking,  (2) systematic skepticism, and (3) automated verification.</p>
<p>I&#8217;m not going to go on about defensive chunking much in the rest of this talking blues, because most of the tactics that fit under that strategy aren&#8217;t controversial any more.  It&#8217;s been nearly forty years since David Parnas schooled us all in software modularity as a way of limiting the amount of complexity that a programmer&#8217;s brain has to handle at one time; we&#8217;ve had generations, in the tempo of this field, to absorb that lesson.</p>
<p>But I am going to point out that is highly unlikely we will ever have another archeological epic quite like C-INTERCAL&#8217;s.  Because another form of defensive chunking we&#8217;ve all gotten used to in the last fifteen years is the kind provided by version-control systems.   What they let us do is make modifications with the confidence that we can revert chunks of them to get back to a known-good state.  And, as a result, hackers these days create version-control repositories for new projects almost as reflexively as they breathe. Project history tends not to get lost any more.</p>
<p>Distributed version control systems like git and hg and bzr help; they&#8217;re astonishingly fast and lightweight to use, lowering the overhead of using them to near nothing.  And one effect of DVCSes that I&#8217;ve confronted in the last couple of days, as ais523 and I got the new C-INTERCAL repo and project off the ground, is to heighten the tension between development strategies that lean more on systematic skepticism and development strategies that lean more on automated verification.  </p>
<p>I&#8217;m going to sneak up on the nature of that tension by talking a bit about about DVCS workflows.  Shortly after I created the C-INTERCAL repo on gitorious, ais523 and I had a misunderstanding.  I emailed him about a feature I had just added, and he pointed out that it had a bug and he&#8217;d pushed  a correction.  I looked, and I didn&#8217;t see it in the repo, and I asked him, and here&#8217;s what he said:</p>
<blockquote><p>
I pushed it to a separate repository, &lt;http://gitorious.org/~ais523/intercal/ais523-intercal&gt; I thought the normal way to collaborate via git was for everyone to have a separate repository, and the changes to be merged into the main one after that. Should I try to push directly to the mainline?
</p></blockquote>
<p>Here&#8217;s what I said in reply:</p>
<blockquote><p>
Yes.  git workflow is highly variable, and the style you describe is normal for larger projects.  Not for small ones, though. I&#8217;ve worked in both styles (my large-project experience is on git itself) so I have a practical grasp on the tradeoffs.</p>
<p>For projects the size of C-INTERCAL (or my gpsd project, which has at most about half a dozen regular committers) the most convenient mode is still to have a single public repo that everybody pulls from and pushes to.  Among other things, this workflow avoids putting a lot of junk nodes in the metadata history that are doing nothing but marking trivial merges.</p>
<p>This is not quite like regressing to svn :-), because you can still work offline, you&#8217;re not totally hosed if the site hosting the public repo crashes, and git is much, *much* better at history-sensitive merging.
</p></blockquote>
<p>Now, this may sound like a boring procedural point, but&#8230;.remember the regression-test suite?  Have a little patience and wait till the regression-test suite comes around on the guitar again and I promise I&#8221;ll have a nice big juicy disruptive idea for you right after it.  Maybe one that even undermines some of my own previous theory.</p>
<p>Here&#8217;s what ais523 came back with, and my next two replies telescoped together:</p>
<blockquote><p>
&gt; Ah; my previous DVCS experience has mostly been in small projects where<br />
&gt; we kept different repos because we didn&#8217;t really trust each other. It<br />
&gt; was rather common to cherry-pick and to ignore various commits until<br />
&gt; they could be reviewed, or even redone from scratch&#8230;</p>
<p>Interesting.  The git group functions this way, but I&#8217;ve never seen it on<br />
any of the small projects I contribute to.  Makes me wonder about<br />
cultural differences between your immediate peer group and mine.</p>
<p>I should note that on the gpsd project, one of the reasons the<br />
single-public-repo works for us is that we have a better alternative<br />
to mutual trust &#8211; an *extremely* effective regression-test suite.  The<br />
implicit assumption is that committers are running the regression<br />
tests on every nontrivial commit.  Why trust when you can verify? :-)</p>
<p>I guess I&#8217;m importing that philosophy to this project.</p>
<p>Hm.  I think I should blog about this.
</p></blockquote>
<p>Why trust when you can verify, indeed?  But I now think the more interesting question turns out to be: Why <em>distrust</em> when you can verify?</p>
<p>Keeping different repos because you don&#8217;t really trust each other,  cherrypicking and having an elaborate patch-review process, being careful who gets actual commit privileges in what &#8211; this is what the Linux kernel gang does.  It&#8217;s the accepted model for large open-source projects, and the hither end of a long line of development in software engineering strategy that says you cope with the fallibility of squishy human brains by applying systematic skepticism.   In fact, if you&#8217;re smart you design your development workflow so it <em>institutionalizes</em> decentralized peer review and systematic skepticism.</p>
<p>Thirteen years ago I wrote <cite>The Cathedral and The Bazaar</cite> and published the generative theory of open-source development that had been implicit in hackers&#8217; practice for decades.   If anyone living has a claim to be the high priest of the cult of systematic skepticism in software development, that would be me.  And yet, in <em>this</em> conversation about C-INTERCAL, as in several previous I&#8217;ve had about my gpsd project since about 2006, I found myself rejecting much of the procedural apparatus of systematic skepticism as the open-source community has since elaborated it&#8230;in favor of a much simpler workflow centered on a regression-test suite.</p>
<p>Systematic skepticism has disadvantages, too.  Time you spend playing the skeptic role is time you can&#8217;t spend designing or coding.  Good practice of it imposes overhead at every level from maintaining multiple repositories to the social risk that an open-source project&#8217;s review-and-approval process may become as factional, vicious and petty-politicized as a high-school cafeteria. Can there be a better way?</p>
<p>The third major strategy in managing software-engineering risk is automated verification.  This line of development got a bad reputation after early techniques for proving code correctness turned out not to scale past anything larger than toy programs.  Fully automated verification of software has never been practical and there are good theoretical reasons (like, the proven undecidability of the Halting Problem) to suppose that it never will be.</p>
<p>Still. Computing power continues to decrease in cost as human programming time  increases in cost; it&#8217;s inevitable that there has been a steady interest in test-centered development and setting programs to catch other program&#8217;s bugs. Conditional guarantees of the form &#8220;I can trust this software if I can trust its test suite&#8221; can have a lot of value if the test suite is dramatically simpler than the software.</p>
<p>Thirteen years ago I wrote that in the presence of a culture of decentralized peer review enabled by cheap communications, heavyweight traditional planning and management methods for software development start to look like pointless overhead.  That has become conventional wisdom; but I think, perhaps, I see the next phase change emerging now. In the presence of sufficiently good automated verification, the heavyweight vetting, filtering, and review apparatus of open-source projects as we have known them also starts to look like pointless overhead.  </p>
<p>There are important caveats, of course.  A relatively promiscuous, throw-the-code-through-the-test-suite-and-see-if-it-jams style can work for gpsd and C-INTERCAL because both programs have relatively simple coupling to their environments.  Building test jigs is easy and (even more to the point) building a test suite with good coverage of the program&#8217;s behavior space isn&#8217;t too difficult.  </p>
<p>What of programs that don&#8217;t have those advantages?  Even with respect to gpsd there are a few devices that have to be live-tested because their interactions with gpsd are too complex to be captured by a test jig.  Operating-system kernels and anything else with real-time requirements are notoriously hard to wrap test harnesses around; even imagining all the timing problems you might want to test for is brutally hard to do. Programs with GUIs are also notoriously difficult to test in an automated way.</p>
<p>I think this objection actually turns into a prescription.  We cannot and should not junk the habits of systematic skepticism. Open source is not going to become obsolete, any more than previous big wins (like, say, high-level languages) became obsolete when we figured out how to do open source methodically.  But what we could be doing is figuring out how to design for testability and do test-centered development on a wider range of programs, with the goal (and the confident expectation) that doing so <em>will reduce the overhead and friction costs of our open-source processes</em>.</p>
<p>The aim should be to offload as much as possible of the work now done by human skepticism onto test logic so that our procedures can simplify even as our development tempo speeds up and our quality improves.  The signature tools of the open-source world over the last fifteen years have been new kinds of collaboration engines &#8211; version control systems, web servers, forge sites.  Perhaps the signature tools of the next fifteen years will be test engines &#8211; coverage analyzers, scriptable emulation boxes, unit-test frameworks, code-auditing tools, and descendants of these with capabilities we can barely imagine today.</p>
<p>But <a href="http://esr.ibiblio.org/?p=1925">the way of the hacker is a posture of mind</a>; mental habits are more important than tools.  We can get into the habit of asking questions like &#8220;What&#8217;s the <a href="http://en.wikipedia.org/wiki/Code_coverage">coverage</a> percentage of your test suite?&#8221; as routinely as we now ask &#8220;Where&#8217;s your source-code repository?&#8221;  </p>
<p>And friends, they may think it&#8217;s a movement.  The INTERCAL Reconstruction Anti-Massacree movement.  No, wait&#8230;this song&#8217;s not really about INTERCAL.  It&#8217;s about how we can up our game &#8211; because the risk and challenges of software engineering never stand still. There&#8217;s an escalator of increasing scale that made the best practices of 1990 (the year C-INTERCAL was born) seem ridiculously patchy in 2010, and will no doubt make today&#8217;s best practices seem primitive in 2030.  Our tools, our practices, and our mental habits can&#8217;t stand still either.</p>