This repository has been archived on 2017-04-03. You can view files and clone it, but cannot push or open issues/pull-requests.
blog_post_tests/20131215051328.blog

30 lines
5.2 KiB
Plaintext

Announcing cvssync, with thoughts on “good enough”
<p>There&#8217;s an ancient Unix maxim to the effect that a tool that gets 85% of your job done now is preferable to one that gets 100% done never. Sometimes chasing corner cases is more work than the problem really justifies.</p>
<p>In today&#8217;s dharma lesson, I shall illustrate this principle with a real-world and useful example.</p>
<p><span id="more-5173"></span></p>
<p>In my last blog post I explained why I had to shoot cvsps through the head. Some of my regulars regretted the loss of the good feature bolted to its crappy repo-analysis code &#8211; it could fetch remote CVS repository metadata for analysis rather than requiring them to have been already mirrored locally.</p>
<p>To fill this functional gap, I needed a tool for mirroring the contents of a remote CVS repository to a local directory. There&#8217;s floating folklore to the effect that a tool called &#8220;cvssuck&#8221; does this job, but when I tried to use it it failed in about the most annoying possible way. It mirrored the directory structure of the remote site without fetching any masters!</p>
<p>Upon investigation I discovered that the cvssuck project site has disappeared and there hasn&#8217;t been a release in years. Disgusted, I asked myself how it could possibly have become that broken. Seemed to me the whole thing ought to be a trivial wrapper around <a href="http://en.wikipedia.org/wiki/Rsync">rsync</a>.</p>
<p>Or&#8230;maybe not. What scanty documentation I found for cvssuck made a big deal out of the fact that it (inefficiently) used CVS itself to fetch masters. This doesn&#8217;t make any sense if they were rsync accessible. because then it would be a much faster and more efficient way to do the same job.</p>
<p>But I thought about the sites I generally have to fetch from when I&#8217;m grabbing CVS repositories for conversion, as I did most recently for the groff project. SourceForge. Savannah. These sites (and, I suspect, most others that still support CVS) do in fact allow rsync access so that project administrators can use it to do offsite backups.</p>
<p>OK, so suppose I write a little wrapper around rsync to fetch from these sites. It might not do the guaranteed fetch that cvssuck advertises&#8230;but on the other hand cvssuck does not seem to actually <em>work</em>, at least not any more. What have I got to lose?</p>
<p>About an hour of experimentation and 78 lines of Python code later, I had learned a few things. First, a stupid-simple wrapper around rsync does in fact work for SourceForge and Savannah. And second, there is a small but significant value the wrapper can add.</p>
<p>The only thing you are pretty much guaranteed to be able to find out about a CVS repository is the CVS command needed to check out a working copy. For example, the groff CVS page gives you this command:</p>
<pre language="sh">
cvs -z3 -d:pserver:anonymous@cvs.savannah.gnu.org:/sources/groff co &lt;modulename&gt;
</pre>
<p>You have to figure out for yourself that the &lt;modulename&gt; should also be &#8220;groff&#8221;, but there are clues to that on the web page. For those of you blessed enough to be unfamiliar with CVS, a single instance can host multiple projects that can be checked out separately; the module name selects one of these.</p>
<p>It isn&#8217;t necessarily clear how to get from that cvs invocation to an rsync command. Here&#8217;s how you do it. First, lop off the &#8220;anonymous@&#8221; part; that is a dummy log credential. Treat &#8220;/sources/groff&#8221; as a file path to the repository directory, then realize that the module is a subdirectory. You and up writing this:</p>
<pre language="sh">
rsync -avz cvs.savannah.gnu.org:/sources/groff/groff my-local-directory
</pre>
<p>That&#8217;s really simple, but it turns out not to work on SourceForge. Because SourceForge runs an rsync daemon and hides the absolute file path to the repository. The corresponding fetch from SourceForge, if groff existed there, would look like this:</p>
<pre language="sh">
rsync -avz groff.cvs.sourceforge.net::cvsroot/groff/groff groff
</pre>
<p>Note the double colon and absence of leading &#8216;/&#8217; on the repository path.</p>
<p>The value a wrapper script can add is knowing about these details so you don&#8217;t have to. Thus, cvssync. You call it with the arguments you would give a CVS checkout command. It pulls those apart, looks at the hostname, figures out how to reassemble the elements into an rsync command, and runs that.</p>
<p>This just shipped with cvs-fast-export release 0.7. At the moment it really only knows two things: A special rule about building rsync commands for SourceForge, and a general rule that happens to work for Savannah and should for most other CVS sites a well. More hosting-site would be easy to add, a line or two at most of Python for each hosting side.</p>
<p>This wrapper doesn&#8217;t do the last 15% of the job; it will fail if the CVS host blocks rsync or has an unusual directory structure. But that 85% now is more valuable than 100% never, especially when its capabilities are so easily extended. </p>
<p>And hey, it only took an hour for me to write, test, document, and integrate into the cvs-fast-export distribution. This is the Great Way of Unix; heed the lesson.</p>