Very Short, Very Sharp: Fixing a broken codebase, part II

In part I, I described a scenario of a software engineer hired to improve a decades-old scientific codebase of 200,000 lines. I proposed seven first-aid measures that together would keep things from getting worse, and ensure all new code was of much higher quality than the old. Here in part II, I'll discuss what to do about the older legacy code.

One tempting choice is to rewrite the whole codebase. It would be enormously satisfying to start fresh and indeed it is likely that the second version would be much better, since it could be designed based on everything that has been learned from the first one. The problem with this plan is time. The basic COCOMO model estimates that writing a 200,000 line program takes 625 person-months, with a delivery time of 28 months. Merely rewriting code is easier than writing it from scratch, so this estimate is likely to be on the high side but this is definitely a project that would take person-years, rather than months. And it is unlikely the group of scientists is willing to wait that long. Accordingly, something more selective is called for.

What I have in mind are four measures that will gradually improve the codebase, without the high hurdle of a complete rewrite. These measures will accelerate the improvement that the policies from part I enabled. In particular, the team should undertake

dead code removal,
refactoring training,
a Better than You Found It policy, and
targeted rewrites.

Dead Code Removal
The codebase in this scenario is a couple of decades old. That means it contains a lot of code that isn't needed any more: old projects, failed experiments, and obsolete concerns. This old unused code is a problem because it complicates the codebase, making it more difficult to add new code for new purposes. It should therefore be removed.

Much of this dead code is likely to be hiding behind options and flags. Some of it may also be entirely commented out. The way to identify the code is to look at what configurations are actually run, and thereby determine what options are actually used. It is then possible to find all the options that aren't used, and work with the users (i.e. the scientists) to determine which ones actually have to be kept.

The policy of assigned code ownership (from part I) aids this effort, since the code owners are the people who know their portions best. They are therefore the right people to undertake the dead code removal. The source code control system is also useful, since it makes it possible to return any removed code if it turns out to be needed later. It is therefore possible to be quite aggressive in removing code that is suspected of being dead.

Once all the dead code is removed, the result will be a much smaller and much clearer codebase that is much easier to work with.

Training in Refactoring
The plan is to gradually improve the existing codebase. This will require carefully refactoring the code, restructuring it to be more comprehensible without changing the results it produces. Doing this is not obvious, particularly for people without training and experience in software engineering. A bit of targeted training would therefore be useful.

What I have in mind is a day-long course. It would start with the concept of code smells, signs that indicate trouble. Among these are very long functions, use of global variables, confusing names of functions and variables, poor encapsulation of functionality, repeated code, and an absence of test coverage. The engineer would then show an example of a refactored module, pointing out problems in the original and how they fixed them. Finally, the students would have a chance the try refactoring themselves, using actual code from the codebase.

Books such as Working Effectively with Legacy Code and Refactoring are useful resources for this training. The idea (from Working Effectively) that legacy code is code without tests and that having tests makes confident refactoring possible, is particularly useful and should be included in the course.

Better Than You Found It
The skills the staff developed in the refactoring course will be put to use by instituting a policy of Better Than You Found It. Whenever a developer makes a change to existing code, they should not only write the new code correctly, but also take the time to improve the nearby code. For example, if they are adding code to a function that is already oversized, they should break it up. Or if they are calling a function with a wildly complicated parameter list, they should redesign the function to be more selective in its inputs. It isn't necessary to fix the whole module, or even the whole file, just a portion of it, leaving things a little better than it was before.

Recall that a complete rewrite is not the plan. Instead, by following the policy of Better Than You Found It, the most commonly used parts of the codebase will steadily improve. And the improvement will come organically, implemented by staff that ran into problems and had to understand the old code anyway. The danger in undertaking this policy is staff resistance, since it is requiring them to do work that is not obviously part of whatever they were trying to accomplish. It is therefore important to make the policy both universal and fairly lightweight. Burdens shared equally are easier to bear, and the task of improving the codebase is less onerous if it is done a little at a time. Ideally, with training and experience, the staff will come to see the ugly old code as distasteful, and will want to fix what is obviously broken, the legacy of the bad old days.

The policies of code ownership and mandatory code review, introduced in part I, support this effort by making sure people don't shirk their refactoring responsibilities.

Targeted Rewrites
Diet and exercise will not cure cancer; that takes surgery and chemotherapy. Similarly, gradual refactoring will not fix the most dramatically broken modules in the codebase; they should be completely rewritten. Typically what happens in an evolving codebase is that some modules, though initially sound, are called upon to do ever more and assume responsibilities far beyond what was originally imagined. This tends to be a problem because their underlying architectures were designed for the original purpose, and often aren't changed as their uses evolve. Over time, this requires increasingly convoluted code to implement new features. These are the modules that should be rewritten from scratch with completely new architectures suitable for their new requirements.

The best source of information for identifying the really bad parts of the code is the development staff itself. They will have many stories about scary modules they are reluctant to modify, but sometimes must. And they will not be reluctant to share this information. The bug database from part I will also be useful, since bad modules tend to be continual sources of bugs.

To Summarize
In part I, I described how to keep the team's codebase from getting worse, by making it possible to write new code cleanly. In this article, part II, I proposed four measures for improving the older code. Dead code removal simplified the codebase by removing unnecessary old code entirely. Training in refactoring built the staff's skills in upgrading the older code, and the policy of Better Than You Found It put those skills to work improving the code, gradually. Finally, where gradual improvement wasn't sufficient, I called for targeted rewrites of the most troubled modules. Doing all of this will take time and effort, but the team will have a dramatically improved codebase in a year or so.

Very Short, Very Sharp

Sunday, August 12, 2012

Fixing a broken codebase, part II

1 comment: