Tuesday, April 24, 2012

Hackstability: how big systems can survive their own success

Just read a fascinating article by Venkat here on ribbonfarm, positing a third path for systems to evolve between total replacement and reengineering and entropy collapse. Most interesting is his example of city technology, where a major old city teeters forever on the brink of collapse, but because the people who /live/ there keep hacking the systems that are the city, the systems keep working.

I've experienced this on the small scale. While at [fill in major technology company here], we had genuine legacy software, a server that was notoriously both fragile and single threaded, so that if any of the dozens of computers that talked to the server threw it a curve, lost their mind, or whatever, the entire server hung, and the factory stopped moving product.

At the time, my thought was "Well shit, guys, write a new one that isn't single threaded. While you're at it, make the damn thing so it can span cluster nodes and load share automatically too." In terms of the article referenced, however, paying interest on the technical debt - that is, the deficiencies of the server from crappy design, kludges, and so forth by having the team of support people I was on monitor the thing and restart it as necessary were cheaper than the paying the real costs to design and implement a replacement without those deficiencies.

The article goes further though. It talks about how technology like the much hated server discussed above becomes non-disposable, how it accretes so much interdependency with other things that it /can't/ be ripped out and replaced. And that's where the concept of hack stability comes in. The humans who maintain the server can continue to hack it, so long as their (often undocumented, non-organized) knowledge is preserved across generations of support people (high turnover) and keep the system either going indefinitely, or at least guide it into a soft landing instead of an uncontrolled crash.

All interesting in terms of computer systems, but the author is extending the reach of this concept to /everything technology has colonized/. That is, the whole planet.

Read the article. It's good. Expect to see it in my work at some point. :)


No comments:

Blog Archive