This repository has been archived on 2017-04-03. You can view files and clone it, but cannot push or open issues/pull-requests.
blog_post_tests/20141024061857.blog

20 lines
5.1 KiB
Plaintext

Moving the NetBSD repository
<p>Some people on the NetBSD tech-repository list have <a href="http://mail-index.netbsd.org/tech-repository/tindex.html">wondered</a> why I&#8217;ve been working on a full NetBSD repository conversion without a formal request from NetBSD&#8217;s maintainers that I do so.</p>
<p>It&#8217;s a fair question. An answer to it involves both historical contingency and some general issues about moving and mirroring large repositories. Because of the accident that a lot of people have recently dropped money on me in part to support an attack on this problem, I&#8217;m going to explain both in public.</p>
<p><span id="more-6476"></span></p>
<p>First, the historically contingent part:</p>
<p>1. Alan Barrett tried to run a full conversion of NetBSD using cvs-fast-export last December and failed (OOM). He then engaged me and we spent significant effort trying to reduce the program&#8217;s working set, but could not prevent OOM on either of the machines we were using. Because Alan was willing to work on this at some length, I formed the idea that there was real demand for a full NetBSD conversion.</p>
<p>2. The NetBSD repo is large and old. I wanted a worst-possible-case (or near worst-possible-case) to test the correctness of the tool on. I knew there might be larger repositories out there (and now it appears that Gentoo&#8217;s is one such) but for obvious historical reasons I thought NetBSD would be an exemplary near-worst case. Thus, it would be a worthy test even if the politics to get the result deployed didn&#8217;t pan out.</p>
<p>I have since been told that NetBSD actually has a git mirror of its CVS repository produced with a two-step conversion: CVS -> Fossil -> git.</p>
<p>This makes me nervous about the quality of the result. Repo conversions produce artifacts due to ontological mismatches between the source and target systems; a two-stage process will compound the problems. Which in turn gives rise exactly the kinds of landmines one least wants &#8211; not obvious on first inspection but chronically friction-causing down the road. </p>
<p>I&#8217;m not speaking theoretically about this; I&#8217;m currently dealing with a major case of landmine-itis in the Emacs repository, which has (coincidentally) just been scheduled for a full switch to git on Nov 11. I&#8217;ve been working on that conversion for most of a year.</p>
<p>For a really high-quality conversion even a clean single-stage move needs human attention and polishing. This is why reposurgeon is designed to amplify the judgment of a human operator rather than attempt to fully mechanize the conversion.</p>
<p>I understand there is internal controversy within NetBSD over a full switch to git. I don&#8217;t really want to get entangled in the political part of the discussion. However, as a technical expert on repository conversions and their problems, I urge the NetBSD team to <em>move the base repository to something with real changesets as soon as possible.</em></p>
<p>It doesn&#8217;t have to be git. Mercurial would do; even Subversion would do, though I don&#8217;t recommend it. I&#8217;m not grinding an axe for git here, I&#8217;m telling you that the most serious, crazy-making traps for the unwary lie in the move from a version-control system without full coherent changesets to a VCS with one. Once you have that conversion done and clean, moving the repository content to any other such system is relatively easy.</p>
<p>(Again, I&#8217;m not speaking theoretically &#8211; reposurgeon is the exact tool you want for such cross-conversions.) </p>
<p>This is my offer: I have the tools and the experience to get you to the changeset-oriented VCS of your choice. I can do a really good job, better than you&#8217;ll ever get from mechanical mirroring or a batch converter, because I know all about common conversion artifacts and how to do things like lifting old version references and ignore-pattern files.</p>
<p>It looks like my tools are git-oriented because they rely on git fast-import streams as an interchange format, but I&#8217;m not advocating git per se &#8211; I&#8217;m urging you to <em>move somewhere with changesets</em>. It&#8217;s a messy job and it wants an expert like me on it, but it only has to be done once. Afterwards, the quality of your developer experience and your future technical options with regard to what VCS you actually want to use will both greatly improve.</p>
<p>Related technical point: the architectural insight behind my tools is that the git folks created something more generally useful than they understood when they defined import streams. Having an editable transfer format that can be used to move content and metadata relatively seamlessly between VCSes is as important in the long term as the invention of the DVCS &#8211; possibly more so.</p>
<p>cvs-fast-export emits a fast-import stream not because I&#8217;m a git partisan (I actually rather wish hg had won the mindshare war) but because that&#8217;s how you get to a sufficiently expressive interchange format.</p>
<p>I&#8217;ll mail this to tech-repository once I can find out how to sign up.</p>