Running Code and Perpetual Beta

When Google’s Gmail  service finally exited beta status in July 2009, five years after it was launched, it already had over 30 million users. By then, it was the third largest free email provider after Yahoo and Hotmail, and was growing much faster than either.1 For most of its users, it had already become their primary personal email service.

The beta label on the logo, indicating experimental prototype status, had become such a running joke that when it was finally removed, the project team included a whimsical “back to beta” feature, which allowed users to revert to the old logo. That feature itself was part of a new section of the product called Gmail Labs: a collection of settings that allowed users to turn on experimental features. The idea of perpetual beta had morphed into permanent infrastructure within Gmail for continuous experimentation.

Today, this is standard practice: all modern web-based software includes scaffolding for extensive ongoing experimentation within the deployed production site or smartphone app backend (and beyond, through developer APIs2). Some of it is even visible to users. In addition to experimental features that allow users to stay ahead of the curve, many services also offer “classic” settings that allow them to stay behind the curve — for a while. The best products use perpetual beta as a way to lead their users towards richer, more empowered behaviors, instead of following them through customer-driven processes. Backward compatibility is limited to situations of pragmatic need, rather than being treated as a religious imperative.

The Gmail story contains an answer to the obvious question about agile models you might ask if you have only experienced waterfall models: How does anything ambitious get finished by groups of stubborn individuals heading in the foggiest possible direction of “maximal interestingness” with neither purist visions nor “customer needs” guiding them?

The answer is that it doesn’t get finished. But unlike in waterfall models, this does not necessarily mean the product is incomplete. It means the vision is perpetually incomplete and growing in unbounded ways, due to ongoing evolutionary experiments. When this process works well, what engineers call technical debt can get transformed into what we might call technical surplus.3 The parts of the product that lack satisfying design justifications represent the areas of rapid innovation. The gaps in the vision are sources of serendipitous good luck. (If you are a Gmail user, browsing the “Labs” section might lead you to some serendipitous discoveries: features you did not know you wanted might already exist unofficially).

The deeper significance of perpetual beta culture in technology often goes unnoticed: in the industrial age, engineering labs were impressive, enduring buildings inside which experimental products were created. In the digital age, engineering labs are experimental sections inside impressive, enduring products. Those who bemoan the gradual decline of famous engineering labs like AT&T Bell Labs and Xerox PARC often miss the rise of even more impressive labs inside major modern products and their developer ecosystems.

Perpetual beta is now such an entrenched idea that users expect good products to evolve rapidly and serendipitously, continuously challenging their capacity to learn and adapt. They accept occasional non-critical failures as a price worth paying. Just as the ubiquitous under construction signs on the early static websites of the 1990s gave way to dynamic websites that were effectively always “under construction,”  software products too have acquired an open-ended evolutionary character.

Just as rough consensus drives ideation towards “maximal interestingness”, agile processes drive evolution towards the regimes of greatest operational uncertainty, where failures are most likely to occur. In well-run modern software processes, not only is the resulting chaos tolerated, it is actively invited. Changes are often deliberately made at seemingly the worst possible times. Intuit, a maker of tax software, has a history of making large numbers of changes and updates at the height of tax season.

Conditions that cause failure, instead of being cordoned off for avoidance in the future, are deliberately and systematically recreated and explored further. There are even automated systems designed to deliberately cause failures in production systems, such as ChaosMonkey, a system developed by Netflix to randomly take production servers offline, forcing the system to heal itself or die trying.

The glimpses of perpetual beta that users can see is dwarfed by unseen backstage experimentation.

This is neither perverse, nor masochistic: it is necessary to uncover hidden risks in experimental ideas early, and to quickly resolve gridlocks with data.

The origins of this curious philosophy lie in what is known as the release early, release often (RERO) principle, usually attributed to Linus Torvalds, the primary architect of the Linux operating system. The idea is exactly what it sounds like: releasing code as early as possible, and as frequently as possible while it is actively evolving.

What makes this possible in software is that most software failures do not have life-threatening consequences.4 As a result, it is usually faster and cheaper to learn from failure than to attempt to anticipate and accommodate it via detailed planning (which is why the RERO principle is often restated in terms of failure as fail fast).

So crucial is the RERO mindset today that many companies, such as Facebook and Etsy, insist on new hires contributing and deploying a minor change to mission-critical systems on their very first day. Companies that rely on waterfall processes by contrast, often put new engineers through years of rotating assignments before trusting them with significant autonomy.

To appreciate just how counterintuitive the RERO principle is, and why it makes traditional engineers nervous, imagine a car manufacturer rushing to put every prototype into “experimental” mass production, with the intention of discovering issues through live car crashes. Or supervisors in a manufacturing plant randomly unplugging or even breaking machinery during peak demand periods. Even lean management models in manufacturing do not go this far. Due to their roots in scarcity, lean models at best mitigate the problems caused by waterfall thinking. Truly agile models on the other hand, do more: they catalyze abundance.

Perhaps the most counter-intuitive consequence of the RERO principle is this: where engineers in other disciplines attempt to minimize the number of releases, software engineers today strive to maximize the frequency of releases. The industrial-age analogy here is the stuff of comedy science fiction: an intern launching a space mission just to ferry a single paper-clip to the crew of a space station.

This tendency makes no sense within waterfall models, but is a necessary feature of agile models. The only way for execution to track the changing direction of the rough consensus as it pivots is to increase the frequency of releases. Failed experiments can be abandoned earlier, with lower sunk costs. Successful ones can migrate into the product as fast as hidden risks can be squeezed out. As a result, a lightweight sense of direction — rough consensus — is enough. There is no need to navigate by an increasingly unattainable utopian vision.

Which raises an interesting question: what happens when there are irreconcilable differences of opinion that break the rough consensus?

Previous | Up | Next


[1]  See 2009 CNET article Yahoo Mail still king as Gmail lurks. As of 2015, Gmail has close to a billion users.

[2] Application Programming Interface, a mechanism for external parties to “plug in” programmatically into a product.

[3] Technical debt, a notion introduced by Ward Cunningham (the inventor of the Wiki) in 1992, is conceptually similar to debt in the economic sense. It usually refers to known pending work, such as replacing temporary expedient hacks with ideal solutions, and “refactoring” to improve inefficiently structured code. The “debt” is the gap between the idealized version of the feature and the one actually in place. In somewhat looser usage, it can also refer, in waterfall processes, to unfinished features in the authoritarian vision that may only exist as stubs in the code or unimplemented specifications. In the context of agile processes, however, all such debt, created through either expedience or incompleteness, is not necessarily “must do” work. If an experimental feature is not actually adopted by users, or rendered unnecessary by a pivot, there may be no point in replacing an expedient solution with an idealized one. Technical surplus can analogously be thought of as the unanticipated growth opportunities (or optionality in the sense of Nassim Taleb in Antifragile) created by users doing creative and unexpected things with existing features. Such opportunities require an expansion in the vision.  The surplus comprises the spillover value of unanticipated uses. As in economics, a project with high technical debt is in a fragile state and vulnerable to zemblanity. A project with high technical surplus is in an antifragile state and open to serendipity.

[4] This is not true of all software of course: there is a different development regime for code with life-threatening consequences. Code developed in such regimes tends to evolve far more slowly and is often between 10-30 years behind the curve. This is one reason for the perception that trivial applications dominate the industry: it takes longer for mission-critical code in life-threatening applications to be updated.