Request for code review: cvs-fast-export

Sometimes reading code is really difficult, even when it’s good code. I have a challenge for all you hackers out there…

cvs-fast-export translates CVS repositories into a git-fast-export stream. It does a remarkably good job, considering that (a) the problem is hard and grotty, with weird edge cases, and (b) the codebase is small and written in C, which is not the optimal language for this sort of thing.

It does a remarkably good job because Keith Packard wrote most of it, and Keith is a brilliant systems hacker (he codesigned X and wrote large parts of it). I wrote most of the parts Keith didn’t, and while I like to think my contribution is solid it doesn’t approach his in algorithmic density.

Algorithmic density has a downside. There are significant parts of Keith’s code I don’t understand. Sadly, Keith no longer understands them either. This is a problem, because there are a bunch of individually small issues which (I think) add up to: the core code needs work. Right now, neither I nor anyone else has the knowledge required to do that work.

I’ve just spent most of a week trying to acquire and document that knowledge. The result is a file called “hacking.asc” in the cvs-fast-export repository. It documents what I’ve been able to figure out about the code. It also lists unanswered questions. But it is incomplete.

It won’t be complete until someone can read it and know how to intelligently modify the heart of the program – a function called rev_list_merge() that does the hard part of merging cliques of CVS per-file commits into a changeset DAG.

The good news is that I’ve managed to figure out and document almost everything else. A week ago, the code for analyzing CVS masters into in-core data objects was trackless jungle. Now, pretty much any reasonably competent C systems programmer could read hacking.txt and the comments and grasp what’s going on.

More remains to be done, though, and I’ve hit a wall. The problem needs a fresh perspective, ideally more than one. Accordingly, I’m requesting help. If you want a real challenge in comprehending C code written by a master programmer – a work of genius, seriously – dive in.

https://gitorious.org/cvs-fast-export/

There’s the repository link. Get the code; it’s not huge, only 10KLOC, but it’s fiendishly clever. Read it. See what you can figure out that isn’t already documented. Discuss it with me. I guarantee you’ll find it an impressive learning experience – I have, and I’ve been writing C for 30 years.

This challenge is recommended for intermediate to advanced C systems programmers, especially those with an interest in the technicalia of version-control systems.