Beware! The Reposturgeon!

I had said I wasn’t going to do it, but…I experimented, and it turned out to be easier than I thought. Release 2.7 of reposurgeon writes (as well as reading) Subversion repositories. With the untested support for darcs, which should work exactly as well as darcs fast-export and fast-import do, this now brings the set of fully-supported version-control systems to git, hg, bzr, svn, and darcs; reposurgeon can be used for repository surgery and interconversion on any of these.

There are some significant limitations in the write-side Subversion support. For various ugly reasons having to do with the mismatch between Subversion’s ontology and that of git import streams, Subversion repositories won’t usually round-trip exactly through reposurgeon. File content histories will remain the same, but the timing of directory creations and deletions may change. The pathological things known in the Subversion world as “mixed-branch commits” are split apart at Subversion-read time and not reassembled when and if the repo state is written back out in Subversion form. Custom Subversion property settings (basically, everything but svn:ignore, svn:executable, and svn:mergeinfo) are lost on the way through. There are other problems of a similar nature, all documented in the manual.

A particularly unfortunate problem is that mergeinfo properties may be simplified or lost. Mapping between gitspace and Subversion merges is messy because a Subversion merge is more like what gitonauts call a “cherry-pick” than a git-space merge – I don’t have a general algorithm for this (it’s a research-level problem!) and don’t try to handle more than the most obvious branch-merge cases.

It could fairly be alleged that the capability to write Subversion repositories is more a cute stunt than anything that’s likely to be useful in a production situation. While I have regression tests for it that show it works on branching and merging commit graphs, I don’t think I’d actually want to trust it, yet, on a repository that wasn’t linear or only simply branching. Arcane combinations of branching, merging, and tagging could reveal subtle bugs without surprising me even slightly.

Still…having it work even as conditionally as it does seems something of an achievement. Not one I was expecting, either. I really only did it because someone on the Subversion dev list asked about write support, I wanted to reply by listing all the reasons it wouldn’t work – and then I found that I couldn’t actually make that list without trying to implement the feature. It was ever thus…

The only unconquered frontier of any significance in open-source VCSes is CVS, really. No way I’ll do write-side support for that (and I mean it this time!) but I’ve sent the maintainer of cvsps a proof-of-concept patch that almost completely implements a fast-export stream dump for CVS repositories. We’ll see where that goes.

Fear the reposturgeon!