It’s been fairly well established for a long time that a small percentage of software developers are insanely more productive than others. Quite how much, who falls in this group, and why they manage to do this has been debated for just as long, but given some recent discussions, particularly surrounding K, this seemed like a topic worth returning to.
Generally when this subject comes up you can assume that whoever is talking assumes themselves into the top part of this, but I honestly don’t know. What I can say is the hyper-productive people I have identified either from their public contributions or less visible work would be less than ten. Not 10%, but ten, absolutely. If you expand this to include those lower the group rapidly expands, but the dropoff in productivity is enormous.
So, who are these people, and what tricks are they using? The first non obvious point is they don’t all fit programmer stereotypes. In fact most of them seem more concerned about the end than the means, yet they’re insanely good at that too, it’s just coincidence.
The easiest way to describe what it is they do is to compare software development to starting a fire. Most developers are still rubbing two sticks together. Occasionally they’ll come up with a machine that rubs two sticks together for them, and get very excited. By contrast the others are stepping back, considering what is going on, and trying to start a nuclear chain reaction. This generally means the initial part of progress appears quite tangential to what is being attempted, but rate of change increases against time.
This enables a good programmer with vision to build dramatically on what has gone before. In the case of K that means developing K itself in C (a non-trivial undertaking in itself) then developing kDb+ in K (which implements KSQL) and then using that to implement a trading platform to print money, when most people would try to develop the trading platform in C (and many do exactly that). What is so surprising about the K solution is how small, in terms of raw code, each of those steps are. Each layer creates more leverage to enable execution of the next. The art is being able to see which layers to produce in order to most quickly get to the end.
My K experience comes from university about 15 years ago, when for some still mysterious reason one of our lecturers was indoctrinating whole batches of students into the K religion. (I don’t know if this punishment is ongoing). The overwhelming majority have the reaction you see all over the net wherever K is described, and I think this is also the initial reaction of those of us that ultimately grok it, largely thanks to having an appearance uncannily like line noise. K has many of the properties normally attached to functional languages (eschewing loops, for example) and once you’ve got used to the whole array paradigm a lot of the manual manipulation required in the Algol derivatives just seems like an absolute waste of time. Indeed much of the defence of K’s inclusion in the university program was it promoted thinking about the problem instead of the implementation, however, to get there requires overcoming the now notorious hurdle of seemingly incomprehensible syntax.
A specific example of the genius of K’s approach is the “where” operator. This is similar to the “where” operator in SQL, but the precise semantics are more interesting, especially given the column based style K encourages. In SQL you tend to think of data modelled in rows, but the K convention is storing the data as an array for each field of your data type, and you cross reference to get a row based on a common index across columns. You can think of K’s where operator as returning a new array of indices into your data where the condition you provided was true. For example, in an array [10,20,30,40] if your condition was greater than 25 you would receive [2,3] – that is the indices of 30 and 40 in the data. You can lookup in arrays using another array as indices with the @ operator (if I’ve remembered this correctly), so to get the data that matched you combine the two operations and the result is [30,40]. However, that’s not too interesting, but given what I said about storing data in columns it should be clear that you can use the indices array you generated from one column to lookup information in another. You can also filter these indices arrays using any other operation you want, as they are just any other array, and so quickly it’s not too hard to see how to implement most of what you would think of as useful from SQL.
Many experienced programmers will be severely angry by this point, mainly because of the standard observation that the languages I’m going on about have a justified reputation for being write-only languages. A lot of the better devs I’ve worked with will confess at some point that they find abandoning objects and using raw arrays in whatever language they find themselves in liberating, resulting in bursts of productivity, even if it goes against accepted best practice for that language. By contrast C++ and especially Java or C# encourage a level of procrastination through class system that is at times inescapable. (One deeply ironic example is a recent effort of mine to contribute to the open source community getting stuck in a tarpit attempting to simulate what I can remember of K’s array system in Java, ultimately getting temporarily derailed by some commercial projects).
The irony is that many will look at the likes of K and say, “What is the point of learning such an esoteric development environment?” without realising how arbitrary and unnecessary a lot of the knowledge required to develop on Linux, Android or iOS actually is. Linux system administration, in particular, can give the illusion of learning useful things, but actually you’re just learning how to deal with whatever the latest pile of crap that’s come down the line is, before it’s inevitably replaced with the next one. The problem is there is such a high rate of new stuff requiring attention and learning that most people never stop and realise they are wasting their whole time learning new shit without actually getting anything useful done at all. Take some time out, and work out how to start from where you are and catapult yourself into the future. Ignore the other stuff because it will become unnecessary faster than you think.
Since I managed to weasel Ray Mears in up there I thought I’d end with yet another link to Alan Kay, who appears to be getting blunter with age, and is always worth listening to:
Got a software gripe, send it to Nigel Birkenshaw at [email protected]