Revenge of the reposturgeon!
Reposurgeon 1.5 is out. This is a major release based on experience gained converting the roundup repository.
The main new feature is code to help in fixing up fossil CVS and SVN commit references, turning them into action stamps. I had to think about the design carefully here, because the task combines a front end that humans do better than machines with a back end that machines do much better than humans.
The problem: You’ve lifted your Subversion repo to git with git-svn or some similar tool. But the comments still have references in them that look like, say, “r2317″. You want to replace these to point to the corresponding changesets in the git repo, but there may be lots of them, and even if patching each one by hand weren’t a huge pain in the ass it’s a fiddly job at which your error rate is likely to be significant.
The part of this task humans are good at is recognizing from context all the random forms a reference can take. There’s the canonical “r2317″, “SVN#2317″, “commit 2317″, “rev 2317″, and other variants. Machines aren’t good at reduciing ambiguity; I passed on solving the strong-AI problem and designed for a workflow in which the human first replaces all these variants with a uniform machine-parseable cookie – to wit, “[[SVN:2317]].
Then the machine does what it’s good at, which is crunching through the logic to replace that cookie with an action-stamp pointing at the same changeset. A human doing this by hand would be prone to boredom-induced detail errors and typos.
How does it get the mapping from revision number to time!committer? One way is if the repo comments contain metadata put in changeset comments specifically to support this, which git-svn does. (reposurgeon also has a command to strip out all that metadata when you’re done with it.)
Or – and here’s the tricky part – reposurgeon will sometimes be able to mine that information out of CVS keyword expansions. So, to take a real-world example from the roundup repo, let’s say a blob in the repo has this string in it:
$Id: ru.po,v 1.6 2004-07-03 13:51:03 a1s Exp $
Then reposurgeon knows that the reference cookie [[CVS:ru.po:1.6]] should be replaced by an action-stamp pointing at whatever commit this blob is attached to (if there are two or more such commits it just grabs the first; this is a bug and I’ll fix it in 1.6).
UPDATE: Duh…I had already fixed it to throw an error in that case! What I need to do is try to correctly handle cases where all but one of the possibilities can be discarded because they refer to the wrong branches.
UPDATE: Mike Swanson pointed out a Python 3 compatibility problem, so I snap-released 1.6 about a day later.