This repository has been archived on 2017-04-03. You can view files and clone it, but cannot push or open issues/pull-requests.
blog_post_tests/20091012224403.blog

14 lines
5.1 KiB
Plaintext

How Not To Tackle the Mess around Forges
<p>In my previous two posts I have diagnosed a significant weakness in the open-source infrastructure. The architecture of the code behind the major SourceForge-descended hosting sites is rotten, with all kinds of nasty consequences &#8212; data seriously jailed, poor or completely absent capabilities near scripting and project migration. I said I was going to do something about it, and I&#8217;m working the problem now &#8212; actually writing code. </p>
<p>The rest of this post is not an announcement, because it will be mostly about things I&#8217; have figured out I should <em>not</em> try to do. Yet. But it is a teaser. I see a path forward, and shortly I expect to have some working code to exhibit that shows the way. Actually, I have working code that attacks the problem in an interesting way <em>now</em>, but I&#8217;m still adding capabilities to make it a more impressive demonstration.</p>
<p>Here are some approaches I&#8217;ve considered, or had suggested to me by others, and rejected:</p>
<p><span id="more-1302"></span></p>
<p>1. <b>Write a new forge system, focused on import/export and scriptability, from scratch.</b> Tempting, but no. That would divert my energy for many months while the problem that originally exercised me &#8212; the data-jail effect &#8212; went unsolved. The first priority has to be jailbreaking the data in existing systems, </p>
<p>2. <b>Rebuild Savane from the inside.</b> Also tempting, and theoretically possible; I have developer privileges on that project, and it&#8217;s moribund &#8211; no commits in like two years. If I wanted to take it over, I probably could. Between <a href="https://gna.org/">gna.org</a> and <a HREF="http://savannah.gnu.org/">Savanna</a> it has a pretty large userbase, enough to give a functional rewrite serious cred. But, again, it would divert me from my original gripe, which was the data-jail problem. Also, Savane&#8217;s architecture inherits the <a href="http://esr.ibiblio.org/?p=1295">curse of SourceForge</a>; trying to fix it while preserving its exact appearance and functionality would be painful in the extreme.</p>
<p>3. <b>Finish the SOAP API in FusionForge, the most widely deployed &#8216;modern&#8217; descendant of SourceForge.</b> I&#8217;m a now project member there, though they haven&#8217;t given me commit privileges yet. I could fix their SOAP services API. The trouble is, their code is a disaster area worse than Savane&#8217;s &#8212; layers upon layers of cruft, so poorly integrated and maintained that their source tree <em>doesn&#8217;t even have a working &#8220;make install&#8221;!</em> I was told with a straight face that the preferred way to set up a running instance from source is to build a Debian binary package file from it and install the package. Oh, and just to put the cherry on top, something is broken in their repository &#8212; I couldn&#8217;t check out a complete source copy without running into some bizarre permissions-related error that hung my Subversion client, eating 100% of my processor. These failures cause me to doubt that the project is sufficiently well run to be a good investment of my time.</p>
<p>4. <b>Write a data-interchange standard, then jawbone existing forges into implementing exporters that speak it.</b> This is what the crowd of research types around COCLICO in France wants to do. I think it&#8217;s a doomed effort; if the &#8216;existing systems&#8217; had a strong enough architecture to support export capabilities that don&#8217;t suck we probably wouldn&#8217;t have this problem in the first place. On top of that there&#8217;s the problems that writing an exporter for an existing forge requires intimate knowledge of the festering crap behind the web interfaces, and deploying it would require install privileges on the forge site. Ain&#8217;t going to happen &#8212; the site admins will, quite justifiably, wonder what the point of disturbing their running installations and accepting the inevitable security risks is when <em>nothing yet exists that can read the exports!</em> COCLICO&#8217;s is a typical over-ambitious academic approach in which you have to solve everything before it solves anything&#8230;.</p>
<p>If these approaches won&#8217;t work, what will?</p>
<p>It&#8217;s too soon to try to prescribe a standard data-interchange format for forges, but it&#8217;s not too soon to write tools that jailbreak the data out of forges and dump it in formats that are forge-type-specific but human-readable &#8212; self-describing JSON or XML dumps, rather than binary blobs. </p>
<p>Also, an ugly fact has to be faced &#8212; because of the PHP+SQL mess inside these things, the only viable approach to extracting the data out of them is to use the same web interfaces humans do. Once you&#8217;ve faced that fact, though, and realized you&#8217;re going to have to build a very smart robotic web scraper, the approach solves a significant set of problems. Most notably, it requires neither cooperation nor competence from any forge site administrators, anywhere.</p>
<p>That&#8217;s enough hints. You&#8217;ll hear from me on this next when I announce some working tools.</p>