doclifter 2.8 is released
In response to a bug report that was relatively easily fixed, I’ve just shipped release 2.8 of doclifter, a program that takes troff-based document markups – including man page markup – and lifts them to DocBook XML.
In doclifter I’ve come as close as I probably ever will to building an AI :-). Automatically lifting the grotty presentation-level goo in man pages (and other troff macro sets) to structural markup is hard – actually, the DocBook user community thought high-quality translations without extensive manual intervention by a human were impossible until I did it. But it turns out that clever parsing and a whole lot of cliche analysis are good enough for about 97% of the real-world cases, and doclifter can throw useful warnings for the other 3%.
This is, by the way, a useful tool even if you’re not interested in DocBook, because DocBook is kind of like Earth orbit – it’s halfway to anywhere. In particular, man to DocBook via doclifter followed by
Docbook to HTML with the stock stylesheets produces better HTML than any of the half-dozen direct man-to-HTML converters out there. This is because none of them actually do much structural analysis – they’re mostly converting presentation-level cliches in troff to presentation-level cliches in HTML. They’re also deficient in handling troff special characters, which doclifter maps to XML Unicode literals.
If you run an open-source project, and your documentation masters are still in troff, please use doclifter to fix this (yes, you will be able to make man pages from the XML after lifting them). That’s why I wrote it – every project that switches to DocBook XML improves our ability to present good-looking and properly hyperlinked documentation over the Web.