This repository has been archived on 2017-04-03. You can view files and clone it, but cannot push or open issues/pull-requests.
blog_post_tests/20140812182459.blog

21 lines
5.8 KiB
Plaintext

Ignoring: complex cases
<p>I shipped point releases of cvs-fast-export and reposurgeon today. Both of them are intended to fix some issues around the translation of ignore patterns in CVS and Subversion repositories. Both releases illustrate, I think, a general point about software engineering: sometimes, it&#8217;s better to punt tricky edge cases to a human than to write code that is doomed to be messy, over-complex, and a defect attractor.</p>
<p><span id="more-6146"></span></p>
<p>For those of you new to version-control systems, an ignore pattern tells a VCS about things for the VCS to ignore when warning the user about untracked files. Such patterns often contain wildcards; for example, &#8220;*.o&#8221; is good for telling almost any VCS that it shouldn&#8217;t try to track Unix object files.</p>
<p>In most version control systems ignore patterns are kept in a per-directory dotfile. Thus CVS has .cvsignore, git has .gitignore, etc. Ignore patterns in Subversion are <em>not</em> kept in such a dotfile; instead they are the values of svn:ignore properties attached to directories.</p>
<p>Translating ignore patterns between version-control systems is messy that most conversion tools fluff it. My reposurgeon tool is an exception; it goes to considerable lengths to translate Subversion ignore properties into patterns in whatever kind of dotfile is required on the target system.</p>
<p>Unfortunately, this feature collides with git-svn. People using that tool to interact with a Subversion repository often create .gitignore files <em>in the Subversion repository</em> which are independent of any native svn:ignore properties it might have.</p>
<p>This becomes a problem when you try to convert the repo to git. In that case, .gitignore files created by git-svn users and .gitignore files generated from the native svn:ignore properties can step on each other in odd ways.</p>
<p>I&#8217;ve had a bug report about this in my inbox for a couple of months. Submitter innocently asked me to write logic that would automatically do the right thing, merging .gitignore patterns with svn:ignore patterns and throwing out duplicates. And somewhere in the back of my brain, a robot voice called out &#8220;WARNING, WILL ROBINSON! DANGER! DANGER!&#8221;</p>
<p>One of the senses you develop after writing complex software for a couple of decades is some ability to tell when a feature is going to be a defect attractor &#8211; a source of numerous hard-to-characterize bugs and a maintenance nightmare. That alarm rang very loudly on this one. But I was blocked for quite a while on the question of what, if any, simpler alternative to go for.</p>
<p>I resolved my problem when I realized that this challenge &#8211; merging the properties &#8211; will be both (a) uncommon, and (b) the sort of thing computers find difficult but humans find easy. Typically it would only have to be dealt with once in the aftermath of a repository conversion, rather than frequently as the repo is in use.</p>
<p>My conclusion was that the best behavior is to discard the hand-hacked SVN .gitignores, warning the user this is being done. It&#8217;s then up to the reposurgeon user to rescue any patterns that should be moved from the old hand-hacked .gitignores to the new generated ones.</p>
<p>Because, very often, the hand-hacked .gitignores are there just to duplicate way the native svn:ignore properties are doing, the user often won&#8217;t have to do any work at all. The unusual cases in which that is false are the same unusual cases that automated merge code could too easily get wrong.</p>
<p>The general point here is that engineering is tradeoffs. Sometimes chasing really recondite edge cases piles up a lot of technical debt for only tiny gains. </p>
<p>The more subtle point is that if you don&#8217;t have any way at all to punt weird cases to a human, your software system may be brittle and overengineered &#8211; doing sporadic exceptional cases at a high life-cycle cost that a human could do cheaply and at a cumulatively lower defect risk.</p>
<p>This bears emphasizing because hackers have such a horror of manularity, going to extreme lengths to automate instead. Sometimes, doing that gets the tradeoff wrong.</p>
<p>Reposurgeon creates the option get this right because it was designed from the beginning as a tool to amplify human judgment rather than trying to automate it entirely out of the picture. All other repository-conversion tools are indeed brittle in exactly the opposite way by comparison.</p>
<p>A similar issue arose with cvs-fast-export. I got a bug report that revealed a couple of issues in how it translates .cvsignore files to .gitignores in its output stream. Among other things, it writes a representation of CVS&#8217;s default ignore patterns into a synthetic .gitignore in the first commit. This is so users browsing the early history in the converted git repo won&#8217;t have untracked files jumping out out them that CVS would have kept quiet about.</p>
<p>With the report, I got a request for a switch to suppress this behavior. The right answer, I replied, was <em>not</em> to add that switch and some complexity to cvs-fast-export. Rather, I reminded the requester that he could easily delete that synthetic .gitignore from the history using reposurgeon. Then I added the command to do that to the list of examples on the reposurgeon man page.</p>
<p>The point, again, is that rushing in to code a feature would have been the wrong thing &#8211; programmer macho. Alternatively, we could view the cvs-fast-export/reposurgeon combination as an instance of the design pattern <a href="http://c2.com/cgi/wiki?AlternateHardAndSoftLayers">alternate hard and soft layers</a> and draw a slightly different lesson; sometimes it&#8217;s better to manually exploit a soft layer than add an expensive feature to a hard one.</p>