This repository has been archived on 2017-04-03. You can view files and clone it, but cannot push or open issues or pull requests.
blog_post_tests/20140329192825.blog
Olivier DOSSMANN d897ae448f Initial commit
2014-11-19 16:42:25 +01:00

13 lines
4.0 KiB
Plaintext

Ugliest…repository…conversion…ever
<p>Blogging has been light lately because I&#8217;ve been up to my ears in reposurgeon&#8217;s most serious challenge ever. Read on for a description of the ugliest heap of version-control rubble you are ever likely to encounter, what I&#8217;m doing to fix it, and why you do in fact care &#8211; because I&#8217;m rescuing the history of one of the defining artifacts of the hacker culture.</p>
<p><span id="more-5634"></span></p>
<p>Imagine a version-control history going back to 1985 &#8211; yes, <em>twenty-nine</em> years of continuous development by no fewer than 579 people. Imagine geologic strata left by no fewer than <em>five</em> version-control systems &#8211; RCS, CVS, Arch, bzr, and git. The older portions of the history are a mess, with incomplete changeset coalescence in the formerly-CVS parts and crap like paths prefixed with &#8220;=&#8221; to mark RCS masters of deleted files. There are hundreds of dead tags and dozens of dead branches. Comments and changelogs are rife with commit-reference cookies that no longer make sense in the view through more modern version-control systems.</p>
<p>Your present view of the history is a sort of two-headed monster. The official master is in bzr, but because of some strange deficiences in bzr&#8217;s export tools (which won&#8217;t be fixed because bzr is moribund) you have to work from a poor-quality read-only git mirror that gets automatically rebuilt from the bzr history every 15 minutes. But you can&#8217;t entirely ignore the bzr master; you have to write custom code to data-mine it for bzr-related metadata that you need for fixing references in your conversion.</p>
<p>Because bzr is moribund, your mission is to produce a full standalone git conversion that doesn&#8217;t suck. Criteria for &#8220;not sucking&#8221; include (a) complete changeset coalescence in the RCS and CVS parts, (b) fixing up CVS and bzr commit references so a human being browsing through git can actually follow them, (c) making sense out of the mess that is RCS deletions in the oldest part of the history.</p>
<p>Also, because the main repo is such a disaster area, there is at least one satellite repo for a Mac OS X port that really wants to be a branch of the main repo, but isn&#8217;t. (Instead it&#8217;s a two-tailed mutant clone of a nine-year old version of the main repo.) You&#8217;ve been asked to pull off a cross-repository history graft so that after conversion day it will look as though the whole nine years of OS X port history has been a branch in this repo from the beginning.</p>
<p>Just to put the cherry on top, your customers &#8211; the project dev group &#8211; are a notoriously crusty lot who, on the whole, do not go out of their way to be helpful. If not for a perhaps surprising degree of support from the project lead the full git conversion wouldn&#8217;t be happening at all. Fortunately, the lead groks it is important in order to lower the barrier to entry for new talent.</p>
<p>I have been working hard on this conversion for eight solid weeks. Supporting it has required that I write several major new features in reposurgeon, including a macro facility, large extensions to the selection-set sublanguage, and facilities for generic search-and-replace on both metadata and blobs.</p>
<p>Experiments and debugging are a pain in the ass because the repository is so big and gnarly that a single full conversion run takes around ten hours. The lift script is over 800 lines of complex reposurgeon commands &#8211; and that&#8217;s not counting the six auxiliary scripts used to audit and generate parts of it, nor an included file of mechanically-generated commands that is over <em>two thousand</em> lines long.</p>
<p>You might very well wonder what could make a repository conversion worth that kind of investment of time and effort. That&#8217;s a good question, and one of those for which you either have enough cultural context that a one-word answer will suffice or else hundreds of words of explanation wouldn&#8217;t be enough.</p>
<p>The one word is: Emacs.</p>