Sunday, March 01, 2009

Technical Debt

Jeff Attwood wrote an article on his blog Coding Horror yesterday all about paying down your technical debt. This is when you do something "the quick and dirty way", which then costs you "interest" in the future in terms of bug fixes, workarounds when new functionality is needed, and just extra time for developers unfamiliar with the code to understand why something was done the way it was. There are certainly times in every developer's life when you have a choice between doing something "the right way", which might take weeks to design and implement properly, or you could do it the easy way, which gets the job done for now, but may have consequences later. If you're under a tight deadline, often the easy way wins out — that's your debt.

People often complain about Microsoft Windows being bloated, and that's largely because of technical debt that they can't easily pay off. When they released Windows NT in 1993, they made sure that all existing Windows and DOS programs would still run. That decision saved them — who's going to upgrade to a brand new OS when there are no programs and drivers for it, and none of your existing stuff will work? — but they incurred a huge debt because of it. Backwards compatibility has always been a huge issue for Microsoft — it's only recently (2007) that they released an OS (Vista) that won't run 16-bit DOS software from the 80's. I cannot imagine how much of the Windows source code is dedicated to running legacy software.

I love this "technical debt" metaphor, as we've gone through it a couple of times on our mobile database product, SQL Anywhere, most notably a few years ago on SQL Anywhere version 10.

One of the advantages of SQL Anywhere is the way we save data in the database file. We do it in such a way that a database created on any supported platform can be copied and used on any other supported platform. Also, if you create your database with one version of our product, you can continue to use it when we release updates for that version, or even completely new versions. Version 9.0.2 of our server, released in 2005, can still run databases created with Watcom SQL 3.2, released in 1992. I remember my time as an Oracle DBA - every time we upgraded Oracle, we had to "fix" the database, and by "fix" I mean we had to rebuild it or upgrade it or something. I don't remember what we had to do, but we had to do something. We also had Oracle on our test server, which was a different platform than the production server, which means that we couldn't just copy the production database to our debug server for testing or debugging purposes, which was quite a pain.

Anyway, while this was a very convenient feature, we did accrue some "technical debt". This is not quite the same as described above, in that we never took the "quick and dirty way", but we still had to have code in the server to support features that had been removed from the product and very old bugs that had long been fixed. After six major versions and thirteen years, there was a lot of these. After much discussion, we decided to take the big plunge with the 10.0 release (known internally as "Jasper" — the last few releases have all had code names from ski resorts, "Aspen", "Vail", "Banff", "Panorama", and the next one is "Innsbruck"), since we were adding a ton of other new functionality with that release. The decision: version 10 servers would not run databases created with version 9 or earlier servers. Everyone would have to do a full unload of all their data and reload it into a new database when upgrading to version 10, and they'd have to do this for all their databases. This would allow us to remove thousands of lines of code from the product, making it smaller, and since we have far less cases of "what capabilities does this database have?", the code can be more efficient. As a simple example, we now know every database that the server can run supports strong encryption, checksums, clustered indexes, and compressed strings, among others, so we don't need to check for those capabilities before using them. There are a lot more assumptions we can make about the layout of the database that makes the code simpler, smaller, and more efficient. We can also add new features that might have clashed with old databases. We knew that the rebuild itself might be inconvenient, and upgrading to version 10 wouldn't be nearly as seamless as previous upgrades, but we also knew that once the initial pain of the rebuild was over with, life would be much better for everyone. We even put a lot of work into streamlining the rebuild process so that it was as fast and simple as possible.

As you can imagine, there was some resistance to this, and I'm sure product management had to handle more than one call from a customer asking "I have to do what with my multi-terabyte database?", but to their credit, they stuck to their guns and told the customers that yes, we know it's inconvenient, but it's really for the best, and you'll appreciate it once the rebuild is done. Or perhaps they blamed it on us, telling the customers "We know it's a pain, but engineering won't budge. They're determined to do this." Either way, it happened, and we did get some more bug reports because of problems with the rebuilds, but for the most part, things went pretty well. That pain paid off the technical debt that we'd accumulated over the previous decade.

Of course, we've since released version 11, which added new stuff to the database file, and we're working on version 12 which adds even more, so now some of those "if the database file has this capability, then do something, otherwise do something else" conditions are creeping back into the product. So far, there aren't a ton of them, so our current interest payments are pretty low, but perhaps in five or six more versions we'll have accumulated enough technical debt that we'll have to bite the bullet and pay it off again.

No comments: