Looking for reposurgeon test cases

I just released reposurgeon 1.2 and am continuing to develop the tool. In order to test some of the newer features, I’m looking for repository conversions to do. If you run an open-source project that is still using CVS or Subversion, or some odd non-distributed VCS, I may be willing to lift it to git for you (and from git to any other DVCS you might prefer is a pretty small step). Details of this offer follow; limited time only, first come, first served.

(Why have me do it? Well…especially for older projects with a complex revision history, it’s a messy and daunting job. The tools are somewhat flaky, the difference between a sloppy conversion and a good one is significant, and good conversions require experience and judgment.)

The ideal test for reposurgeon is a Subversion repository of a project that was formerly CVSed and contains a lot of junk commits and artifacts generated by cvs2svn conversion. I’d also like to lift at least one project now in CVS so I can get a good feel for how cvs2svn behaves today (I know it has it has substantial improvements from older versions because I wrote at least one of those improvements myself).

The conversion process will look like this:

1. If starting from CVS, I’ll make a preliminary conversion with git-cvsimport. If starting with Subversion, I’ll do the preliminary conversion with git-svn. If your repository is in something weird, I’ll need to either find a lifting tool, or possibly build one, or tell you it’s more work than I’m willing to do.

2. This is the interesting part: clean up the mess. Up-converted repos tend to be full of conversion artifacts. For example, many versions of cvs2svn mechanically generate commits to represent CVS release tags; a high-quality conversion should create actual tag objects corresponding to the junk commits and delete the junk. Also, any commit references in the change comments need to be fixed up (generally I convert things like Subversion revision numbers to committer + date stamp).

The result of a really good after-conversion cleanup looks as though the project had been using git from day one. I’ve done several of these now, mostly on my own projects but recently for the Roundup bug tracker. Each time I do one of these reposurgeon gets better – more features, bugs exposed and fixed. That’s the point; reposurgeon is a good tool, and I want to case-harden it into a great one.

There are some conditions on this offer.

First and most importantly, I want the result to be used. A conversion typically involves three to four days of hard work. If your repo has a kind of cruft or malformation in it that I haven’t seen before, well, teaching reposurgeon to deal with that is the point of the exercise but it also means the conversion may take longer. A precondition for me to put in that kind of work is that the political ducks have to be lined up first – the project has to have decided to move and be willing to use the results. (Yes, the project should exercise due diligence to verify that I haven’t screwed up; that’s a different issue.)

I’m only willing to do a limited number of these, so if I get a flood of requests I’m going to be choosy. Preference will go to projects that are older and/or more important and/or larger. The ideal candidate would be an important piece of open-source infrastructure with a long, messy history rooted in CVS or RCS or SCCS.

If you want it, conversion from git to another DVCS (hg, bzr, whatever) is your problem. I’ll point you at tools, but the only part I’m interested in is already done when you have your git repo.

Again, the sort of capability I’m looking to improve in reposurgeon is automated recognition and cleanup of conversion cruft. I may experiment with features like branch merge detection if conditions seem right.

UPDATE: When you make your request, please have the following things ready:

1. A repository-access URL.

2. An authors file mapping local user IDs to email addresses and user names (the git up-conversion needs this).