This repository has been archived on 2017-04-03. You can view files and clone it, but cannot push or open issues or pull requests.
blog_post_tests/20140902214717.blog

16 lines
4.1 KiB
Plaintext
Raw Normal View History

2014-11-19 15:42:25 +00:00
Adverse selection and old technology
<p>Yesterday I shipped <a href="http://www.catb.org/esr/cvs-fast-export/">cvs-fast-export</a> 1.15, with a significant performance improvement produced by replacing a naive O(n**3) sort with a properly tuned O(n log n) version.</p>
<p>In ensuing discussion on G+, one of my followers there asked if I thought this was likely to produce a real performance improvement, as in small inputs the constant setup time of a cleverly tuned algorithm often dominates the nominal savings.</p>
<p>This is one of those cases where an intelligent question elicits knowledge you didn&#8217;t know you had. I discovered that I do believe strongly that cvs-fast-export&#8217;s workload is dominated by large repositories. The reason is a kind of adverse selection phenomenon that I think is very general to old technologies with high exit costs.</p>
<p>The rest of this blog post will use CVS as an example of the phenomenon, and may thus be of interest even to people who don&#8217;t specifically care about high version control systems.</p>
<p><span id="more-6216"></span></p>
<p>Cast your mind back to the point at which CVS was definitely superseded by better VCS designs. It doesn&#8217;t matter for this discussion exactly when that point was, but you can place it somewhere between 2000 and 2004 based on when you think Subversion went from a beta program to a production tool.</p>
<p>At that point there were lots of CVS repositories around, greatly varying in size and complexity. Some were small and simple, some large and ugly. By &#8220;ugly&#8221; I mean full of Things That Should Not Be &#8211; tags not corresponding to coherent changesets, partially merged import branches, deleted files for which the masters recording older versions had been &#8220;cleaned up&#8221;, and various other artifacts that would later cause severe headaches for anyone trying to convert the repositories to a modern VCS.</p>
<p>In general, size and ugliness correlated well with project age. There are exceptions, however. When I converted the groff repository from CVS to git I was braced for an ordeal; groff is quite an old project. But the maintainer and his devs had been, it turned out very careful and disciplined and comitted none of the sloppinesses that commonly lead to nasty artifacts.</p>
<p>So, at the point that people started to look seriously at moving off CVS, there was a large range of CVS repo sizes out there, with difficulty and fidelity of up-conversion roughly correlated to size and age.</p>
<p>The result was that small projects (and well-disciplined larger projects resembling groff) converted out early. The surviving population of CVS repositories became, on average, larger and gnarlier. After ten years of adverse selection, the CVS repositories we now have left in the wild tend to be the very largest and grottiest kind, usually associated with projects of venerable age. </p>
<p>GNUPLOT and various BSD Unixes stand out as examples. We have now, I think, reached the point where the remaining CVS conversions are in general huge, nasty projects that will require heroic effort with even carefully tuned and optimized tools. This is not a regime in which the constant startup cost of an optimized sort is going to dominate.</p>
<p>At the limit, there may be some repositories that never get converted because the concentrated pain associated with doing that overwhelms any time-discounted estimate of the costs of using obsolescent tools &#8211; or even the best tools may not be good enough to handle their sheer bulk. Emacs was almost there. There are hints that some of the BSD Unix repositories may be there already &#8211; I know of failed attempts, and tried to assist one such failure.</p>
<p>I think you can see this kind of adverse selection effect in survivals of a lot of obsolete technology. Naval architecture is one non-computing field where it&#8217;s particularly obvious. Surviving obsolescent ships tend to be <em>large</em> and ugly rather than small and ugly, because the capital requirement to replace the big ones is harder to swallow.</p>
<p>Has anyone coined a name for this phenomenon? Maybe we ought to. </p>