blog_post_tests/20140517070235.blog

Managing compatibility issues in ubiquitous code
<p>There&#8217;s a recent bug filed against giflib titled <a href="https://sourceforge.net/p/giflib/bugs/58/">giflib has too many unnecessary API changes</a>.  For a service library as widely deployed as it is (basically, on everything with a screen and network access &#8211; computers, smartphones, game consoles, ATMs) this is a serious complaint.  Even minor breaks in API compatibility imply a whole lot of code rebuilds.  These are not just expensive (requiring programmer attention) they are places for bugs to creep in.</p>
<p>But &#8220;Never change an API&#8221; isn&#8217;t a good answer either.  In this case, the small break that apparently triggered this report was motivated by a problem with writing wrappers for giflib in C# and other languages with automatic memory management.  The last round of major changes before this was required to handle GIF animation blocks correctly and make the library thread-safe.  Time marches on; service libraries have to change, and APIs with them, even when change is expensive.</p>
<p>How does one properly reconcile these pressures?  I use a small set of practice rules I think are simple and effective, and which I think are well illustrated by the way I apply them to giflib.  I&#8217;m writing about them in public because I think they generalize.</p>
<p><span id="more-5754"></span></p>
<p>First rule: if backward-compatibility is a must, fork your library into API-stable versus unstable/evolving versions.  This is why I ship both a 4.2.x giflib and a 5.x.x giflib.  The 4.2.x version is backward-compatible to the year zero; because of this, application developers get a choice and the effective cost of API breakage in the 5.x.x series decreases a great deal.</p>
<p>There are costs to this maneuver.  The main cost to you, the library developer, is that you will need to cross-port fixes from one line of development to the other.  This is acceptable for giflib, which is pretty small; it gets more difficult for larger, more complex libraries.</p>
<p>The cost to the application developers using it is more serious.  The stable version plain won&#8217;t get some fixes from the unstable version, exactly the ones that would require API changes. 4.2.x is never going to be thread-safe, and its extension-block handling is a bit flaky in edge cases.  Also, it&#8217;s easy to drop a stitch and fail to cross-port fixes that could and should be applied.</p>
<p>In the case of giflib, these are not major problems.  The 4.2.x code is very old, very stable, and has passed the test of time and wide deployment.  Apparently there was never a lot of need for thread-safety in the past, and the the extension-block handling was good enough; we know these things because the rate of reported defects over the life of the project has been ridiculously low &#8211; averaging, in fact, fewer than four per <em>year</em> over a quarter century.</p>
<p>Other libraries may incur different (higher) implied costs under this strategy.  If your service code is necessarily evolving really fast, forking a stable version may not be practical because the cost of back-porting fixes is insupportable.  Engineering is tradeoffs; the point of this essay is more to raise awareness of the tradeoffs than to argue that any one rule of practice is always right.  Be aware of why you&#8217;re doing what you&#8217;re doing, and document it.</p>
<p>Second rule: Provide #defines bearing each level of the release number in your library header so that people can use compile-time conditionals in the C preprocessor to write code paths that will compile and just work with any version of the library. (There are equivalent tactics in other languages.)</p>
<p>There&#8217;s no downside to this. If you do it properly, application developers can choose to never lose back-compatibility with older versions of your library. Just as importantly, they can <em>know</em> they&#8217;ll never lose it.  This is a confidence-builder.</p>
<p>Third rule: document, document, document.  Every API change requires an explanation.  Especially, do not ever leave your client-application developers in doubt about when an API or behavior change took place.  They need to be able to conditionalize their code properly to track your changes (see the second rule), and they can only do that if they know exactly when in your release timeline each change occurred. This, too, is a confidence builder.</p>
<p>Fourth rule: Prefer noisy breakage to quiet breakage.  The worst kind of API change is the kind that introduces an incompatible behavior change without advertising the fact.  That way lies bugs, madness, and other developers rightly cursing your name. </p>
<p>Even so, this happens a lot because library maintainers mis-estimate tradeoffs.  There&#8217;s a tendency to think that requiring users to recompile their applications (or re-link to a new major version of a shared library) is so irritating that it&#8217;s better to preserve the API by slipstreaming in changes in run-time behavior that you tell yourself will only be problematic or incompatible in rare edge cases. This belief is almost always wrong!</p>
<p>The bug report that motivated this apologia came in because the person who filed it thinks I shouldn&#8217;t have altered the argument profile of DGifClose() and EGifClose().  What he fails to understand is that I chose this path over some trickier alternatives because I <em>wanted</em> the API breakage to be noisy and obvious at compile time.  This way, the client-application builds will break once, the fix will be easy, and the result will be <em>right</em>.</p>
<p>To apply rule four in this way, it helps to have been careful about rules one through three, in order to lower the cost of the disruption.  Thus, application developers using giflib have 4.2.x to fall back on if they really can&#8217;t live with my break-it-noisily practice.</p>
<p>You also want to put in effort to make sure the fix really is easy.  Not just to save other developers work, though they&#8217;ll thank you for that; the real reason is that tricky fixes get misapplied and spawn bugs.</p>
<p>The bug reporter wants to know why I didn&#8217;t leave DGifClose() and EGifClose() as they were and introduce new entry points with the different profile.  This is a fair question, and representative of a common argument for adding complexity to library APIs rather than breaking backward compatibility.  It deserves an answer.</p>
<p>Here it is: code and API complexity are costs, too.  They&#8217;re a kind of technical debt that creeps up on you, gradually.  Each such kluge looks justified when you do it, until you turn around and discover you have an over-complex, unmaintainable, buggy mess on your hands.  I take the long view, and prefer not to let this degeneration even get started in my code!  This choice may transiently annoy people, but it&#8217;s going to lower their exposure to defects over the whole lifetime of the software.</p>
<p>Being able to take this pro-cleanliness position is an un-obvious but important benefit of open source.  The people in my distribution chain may gripe about having to do rebuilds from source, but they can do it. When you&#8217;re gluing together opaque binary blobs, the cost of API breakage is severe and you get forced into tolerating practices that will escalate code bloat and long-term defect rates.</p>