Hackers and anonymity: some evidence

When I have to explain how real hackers differ from various ignorant media stereotypes about us, I’ve found that one of the easiest differences to explain is transparency vs. anonymity. Non-techies readily grasp the difference between showing pride in your work by attaching your real name to it versus hiding behind a concealing handle. They get what this implies about the surrounding subcultures – honesty vs. furtiveness, accountability vs. shadiness.

One of my regular commenters is in the small minority of hackers who regularly uses a concealing handle. Because he pushed back against my assertion that this is unusual, counter-normative behavior, I set a bit that I should keep an eye out for evidence that would support a frequency estimate. And I’ve found some.

Recently I’ve been doing reconstructive archeology on the history of Emacs, the goal being to produce a clean git repository for browsing of the entire history (yes, this will become the official repo after 24.4 ships). This is a near-unique resource in a lot of ways.

One of the ways is the sheer length of time the project has been active. I do not know of any other open-source project with a continuous revision history back to 1985! The size of the contributor base is also exceptionally large, though not uniquely so – no fewer than 574 distinct committers. And, while it is not clear how to measure centrality, there is little doubt that Emacs remains one of the hacker community’s flagship projects.

This morning I was doing some minor polishing of the Emacs metadata – fixing up minor crud like encoding errors in committer names – and I made a list of names that didn’t appear to map to an identifiable human being. I found eight, of which two are role-based aliases – one for a dev group account, one for a build engine. That left six unidentified individual contributors (I actually shipped 8 to the emacs-devel list, but two more turned out to be readily identifiable within a few minutes after that).

I’m looking at this list of names, and I thought “Aha! Handle frequency estimation!”

That’s a frequency of just about exactly 1% for IDs that could plausibly be described as concealing handles in commit logs. That’s pretty low, and a robust difference from the cracker underground in which 99% use concealing handles. And it’s especially impressive considering the size and time depth of the sample.

And at that, this may be an overestimate. As many as three of those IDs look like they might actually be display handles – habitual nicknames that aren’t intended as disguise. That is a relatively common behavior with a very different meaning.