blog_post_tests/20141016115300.blog

A low-performance mystery: Sometimes you gotta simplify
<p>This series of posts is increasingly misnamed, as there is not much mystery left about cvs-fast-export&#8217;s performance issues and it is now blazingly, screamingly, bat-out-of-hell fast. As in both threaded and unthreaded version convert the entire history of groff (15593 CVS deltas in 1549 files in 13 seconds flat.  That would be about 10K CVS commits per minute, sustained; in practice the throughput will probably fall off a bit on very large repositories.</p>
<p>I achieved the latest doubling in speed by not succumbing to the temptation to overengineer &#8211; a trap that lays in wait for all clever hackers.  Case study follows.</p>
<p><span id="more-6385"></span></p>
<p>To review, for each master there&#8217;s a generation loop that runs to produce all its revision snapshots.  Because CVS stores deltas in reverse (that is, the tip node contains the entire most recent revision, with the deltas composing backward to an empty file) the snapshots are emitted in reverse order. </p>
<p>These snapshots are then stashed in a temp directory to be picked up and copied out in the correct (canonical git-fast-export order) &#8211; forward, and just in time for the commits that reference them. </p>
<p>The reason for this generate-then-copy sequence (which doubles the program&#8217;s I/O traffic) was originally twofold. First, I wanted the output streams to be look as much as possible like what git-fast-import would ship from an equivalent git repository.  Second, if you&#8217;re going to make incremental dumping work (give me a stream of all commits since time T) you <em>must</em> use this canonical order.  For some people this is a must-have feature.</p>
<p>Then, when I added multithreading, the temp files achieved a different utility.  They were a place for worker threads to drop snapshots without stepping on each other.</p>
<p>When I last posted, I was preparing to attempt a rather complicated change in the code.  To get rid of the temp files, but preserve canonical ordering, my plan was to pick apart that generation loop and write a function that would just give me snapshot N, where N is the number of revisions from base.  I could then use this to generate blobs on demand, without I/O traffic.</p>
<p>In preparation for this change, I separated snapshot generation from master analysis and moved it into stage 3, just before export-stream generation.  When I did this, and profiled, I noticed a couple of things.</p>
<p>(1) The analysis phase became blisteringly fast.  To the point where it could parse and analyze 15,000 CVS masters in less than a second. The output traffic to write the snapshots had been completely dominating not just the analysis computation but the input cost to read the masters as well.</p>
<p>(2) Snapshot writes were no longer threaded &#8211; and that didn&#8217;t make a damn bit of difference to the throughput.  Snapshot generation &#8211; the most compute-intensive part of the program &#8211; was <em>also</em> completely dominated by I/O time.  So the utility of the temp files began to look at best questionable.</p>
<p>(3) Threading stopped making any noticeable difference in throughput, either positive or  negative.</p>
<p>Reality was trying to tell me something.  The something was this: forget being clever about threading and incremental blob generation in core.  It&#8217;s too complicated. All you need to do is cut the snapshot I/O traffic. Ditch the canonical dump order and ship the snapshots as they&#8217;re made &#8211; never do a copy.</p>
<p>Keep it simple, stupid!</p>
<p>That is what I did.  I couldn&#8217;t give up on canonical order entirely; it&#8217;s still needed for incremental dumping, and it&#8217;s handy for testing. But the tool now has the following behavior:</p>
<p>* Below a certain size threshold (byte volume of master files) it defaults to dumping in canonical order, with temp file copies.</p>
<p>* Above that size, it dumps in fast order (all blobs first), no copying.  </p>
<p>* There are -C and -F command-line option to force the dump style.</p>
<p>The threshold size is set so that canonical order is only defaulted to when the resulting dump will take almost no time even with the copies.</p>
<p>The groff repo is above the threshold size.  The robotfindskitten repo is below it.  So are my regression-test repos. Yes, I did add a regression test to check that canonical-order and fast-order conversions are equivalent! </p>
<p>And I think that brings this saga nearly to a close.  There is one more optimization I might try &#8211; the bounded-queue-with-writer-thread thing some of my regulars suggested. I frankly doubt it&#8217;ll make a lot of difference; I&#8217;ll probably implement it, profile to show that, and then remove it to keep the code simple.</p>
<p>This does not, however, mean that the bucks people threw at the Help Stamp Out CVS In Your Lifetime fund were misdirected. I&#8217;m going to take the combination of cvs-fast-export and a fast engine to run it on in hand, and use it. I shall wander the net, hunting down and <s>killing</s> converting the last CVS repositories. Unlike hunting mammoths to extinction, this will actually be a <em>good</em> thing.</p>